Facebook provides an Application Programmable Interface ("API") to authorized users who may search for ads in their archive. However, due to the inconsistent state of the Facebook Ad Library API, our methods to scan and discover ads must be adapted on a daily and sometimes hourly basis. We regret we do not have reliable or predictable instructions on how to retrieve political ads from Facebook. Below, we describe our default crawler settings and various workarounds. For more details, see our
data collection log.
Identity Confirmation
To gain access to the API, you need to first
confirm your identity with Facebook. The process took approximately 11 days for us, from submitting the request online to receiving the confirmation code in the mail.
API Parameters
According to the
Ad Library API documentation, the following parameters are available in a search query:
ad_active_status,
ad_reached_countries,
ad_type,
search_page_ids, and
search_terms.
By default, the API returns only 25 ads per page. You may request more ads per page by adding an undocumented parameter, limit, to either your initial search query or requests for subsequent pages.
For each search, you also may request various additional data fields. Available data fields are documented on the
Archived Ad page.
Default Crawler Settings
Our default daily crawl consists of the following search where [TOKEN] is the application token that you'll receive after identity confirmation.
https://graph.facebook.com/v3.2/ads_archive?access_token=[TOKEN]&ad_type=POLITICAL_AND_ISSUE_ADS&ad_active_status=ALL&fields=ad_creation_time%2Cad_creative_body%2Cad_creative_link_caption%2Cad_creative_link_description%2Cad_creative_link_title%2Cad_delivery_start_time%2Cad_delivery_stop_time%2Cad_snapshot_url%2Ccurrency%2Cfunding_entity%2Cimpressions%2Cpage_id%2Cpage_name%2Cspend%2Cregion_distribution%2Cdemographic_distribution&limit=500&ad_reached_countries=AT%2CBE%2CBG%2CHR%2CCY%2CCZ%2CDK%2CEE%2CFI%2CFR%2CDE%2CGR%2CHU%2CIE%2CIT%2CLV%2CLT%2CLU%2CMT%2CNL%2CPL%2CPT%2CRO%2CSK%2CSI%2CES%2CSE%2CGB&search_terms=.
For readability, below is the unescaped URL:
https://graph.facebook.com/v3.2/ads_archive?access_token=[TOKEN]&ad_type=POLITICAL_AND_ISSUE_ADS&ad_active_status=ALL&fields=ad_creation_time,ad_creative_body,ad_creative_link_caption,ad_creative_link_description,ad_creative_link_title,ad_delivery_start_time,ad_delivery_stop_time,ad_snapshot_url,currency,funding_entity,impressions,page_id,page_name,spend,region_distribution,demographic_distribution&limit=500&ad_reached_countries=AT,BE,BG,HR,CY,CZ,DK,EE,FI,FR,DE,GR,HU,IE,IT,LV,LT,LU,MT,NL,PL,PT,RO,SK,SI,ES,SE,GB&search_terms=.
Parameter — ad_active_status
We always set ad_active_status=ALL to request all ads in the archive.
Parameter — ad_reached_countries
By default, we search for ads in the all 28 member states by setting ad_reached_countries to AT,BE,BG,HR,CY,CZ,DK,EE,FI,FR,DE,GR,HU,IE,IT,LV,LT,LU,MT,NL,PL,PT,RO,SK,SI,ES,SE,GB.
Facebook uses ISO instead of EU-recommended country codes
Facebook uses ISO-3166 country code instead of the EU-recommended country code. The United Kingdom is coded as GB instead of UK. Greece is coded as GR instead of EL.
Workaround for pagination errors
We frequently encounter pagination issues (e.g.,
infinite loop bug,
invalid next page bug,
random termination bug) and are unable to retrieve all pages associated with a search. When such errors occur, we split the crawl into 28 smaller searches, by requesting the ads for each member state separately (e.g.,
ad_reached_countries=AT,
ad_reached_countries=BE,
ad_reached_countries=BG).
However, please be aware that the
API may return inconsistent results when you search for ads in all E.U. member states collectively in a single query, than when you search each member state individually in a separate query.
Parameter — ad_type
We set this parameter to the default and only supported value, POLITICAL_AND_ISSUE_ADS.
Not recognized in Graph Explorer API
This parameter is not recognized by the
Graph API Explorer, a tool used by Facebook for recording API sessions and reporting bugs. Remember to remove this parameter when reporting bugs, otherwise Facebook will not be able to reproduce your API sessions.
Parameter — search_terms
Empirically, we find that we could retrieve the most number of ads by using the period (.) as the search term — after experimenting with multilingual dictionary-based approaches, stopwords, and other types of punctuations.
No guarantee of completeness
Incorrect results
Unreproducible results
More generally, the
results provided by the API are unreproducible. You may receive significantly different results when you repeat an identical search on the same day, or even when you conduct two identical searches within seconds of each other.
Parameter — search_page_ids
We do not have the necessary data to use this field.
No available values
Even though the Ad Library API provides the capability to search for ads using page_id, the API does not provide a list of available page_ids.
Technically, a list of page_ids could be scraped from the Facebook Ad Library Report. However, such actions are prohibited by Facebook's terms of services. We did not take such actions, and therefore do not have the data needed to crawl political ads in the European Union using page_ids.
Parameter — limit
We request 500 ads per page at the start of each day. However, depending on the errors and error types, we may increase, decrease, or sample the potential values of limit.
Reasons for decreasing the value
Even though in public statements (e.g.,
Ad Library FAQ), Facebook states that users may request up to 5,000 ads per page, empirically, we find that the
API fails with increasing frequency as we request more ads per page. While the API may succeed on odd occasions, it would frequently return zero ads and ask us to re-try our requests. On May 7, when we last
measured the API failure rates, we received on average 223, 125, and 101 ads per page, when we requested 1,000, 2,000, and 4,000 ads respectively.
When you encounter a significant amount of general API failures, you may wish to reduce limit otherwise requesting more ads may cause the API to return fewer ads.
Reasons for increasing the value
Due to the numerous pagination bugs in the Ad Library API, we often cannot retrieve all pages associated with a search (e.g.,
infinite loop bug,
invalid next page bug,
random termination bug). In which case, increasing the value of
limit may reduce the chance of encountering a pagination failure and improve the likelihood of completing a search.
On the days when you encounter a significant amount of pagination errors, you may wish to increase the value of limit. Retrieving each additional page is like playing a round of Russian roulette, it's strategic to minimize the number of potential failure points.
Reasons for sampling potential values
You may wish to randomly perturb the value of limit to guard against such failures.