This section describes data and methodology used for the analysis. Facebook’s advertising platform provides anonymous and aggregated information on Facebook users through a dedicated Application Programming Interface (API) [26]. This platform enables advertisers to run advertisements targeted at users of Facebook’s family of applications and services, which include Facebook, Instagram, Messenger, and the Audience Network. It can be used to retrieve the estimates of the Monthly Active Users who are eligible to be shown an advertisement given a set of user characteristics. MAUs include users active in the previous 30 days. In this work, we focus on two main user characteristics, namely the country of residence and the language of the users. This latter attribute is provided by Facebook advertising platform to target people with language other than common language for a location. Since Facebook does not directly provide information on the nationality of its users, we use the language as a proxy to infer users of Ukrainian nationality. To test this hypothesis, in Sect. 5 we compare the Ukrainian-speaking MAUs relative to the month before the Russian invasion of Ukraine with official Ukrainian diaspora data in the EU from EUROSTAT.Footnote 6 Our implicit assumption is that the number of Facebook MAUs relative to Ukrainian-speaking users in each EU country is fairly stable before the war and it is therefore comparable with the latest diaspora records.
However, using the language option has a drawback, since starting from the 23th of August 2021 target advertisement to people under the age of 18 is not available.Footnote 7 To take this into consideration, during the comparison with diaspora data we only consider the Ukrainian citizens over 18.
It is worth highlighting that self-declared Ukrainian-speaking MAUs do not reflect the total Ukrainian population, the two main reasons being (i) not all Ukrainians use Facebook (in particular, under 13 people cannot open an account); (ii) Ukrainian is the language spoken by the vast majority of people in the country, but other languages are also common, in particular Russian.Footnote 8 Nevertheless, Ukrainian language is not very diffuse outside Ukraine and the neighboring countries, and its diffusion in Europe is very limited.Footnote 9 Moreover, we acknowledge that not all people fleeing Ukraine are Ukrainian nationals. In fact, the EU Temporary Protection Directive is directed to everyone fleeing the country, regardless of their nationality.
Recent studies have focused on the reliability of the socio-demographic information provided by the Facebook’s advertising platform [26, 28–30]. Sances [29] and Grow et al. [28] observe that the information reported by the users upon creating their account, in particular those that are less likely to change over time (e.g. gender, age), are generally accurate and more reliable than other information which are inferred by Facebook advertising algorithms, such as the region of residence. Grow et al. [28] report that misclassifications between the actual characteristics of the users and the ones provided by Facebook are most likely to occur for the region of residence, which is partially inferred by Facebook and may change frequently, thereby increasing the chance for erroneous classifications. However, Sances [29] states that classifications on the region of residence are more likely to be correct in larger regions than in smaller regions. Since we are looking at changes in MAUs at national scale, we assume the considered geographical resolution to be sufficiently large to neglect major classification errors.
It is important to point out that Facebook estimates are not designed to match population, census estimates, or other sources, and are not to be considered as a proxy for monthly or daily active users on Meta, or engagement.Footnote 10 They may differ depending on factors such as:
-
how many Facebook apps and services accounts a person has.
-
how many temporary visitors are in a particular geographic location at a given time.
-
Facebook user-reported demographics.
However, recent studies indicate that despite measurement issues and selection bias, it is potentially feasible to derive robust estimates of demographic indicators from tabulations of Facebook users [25, 26, 31, 32]. The same works present approaches to generate bias-adjusted population estimates and demographic counts to derive the actual distributions for specific audiences of interest. Similarly to [31], we estimate the Facebook penetration rate in Ukraine by dividing the prewar Ukrainian-speaking MAUs in Ukraine by the population over 18 in Ukraine provided by the Ukrainian statistical office.Footnote 11 In Sect. 5 we explain in detail how this is calculated and how we use the estimated penetration rate as a correction factor for Facebook audience estimates in each country.
One key aspect when using non-traditional data is validating them with reliable sources when available. To this date, public data on the actual flows of people fleeing Ukraine are very limited. We rely on data on refugee influx from Ukraine in neighboring countries available at the Operational Data Portal of UNHCR. In Sect. 5 we compare the weekly change in Facebook MAUs with daily UNHCR inflow data for the five weeks following the beginning of the war. The comparison is made for the EU countries neighboring Ukraine.
MAU estimates refer to a 30 days time span, and we don’t know if the target audience for a given country might be inflated by users transiting in a country to reach another country of destination; for instance, a user travelling in different countries will be counted as many times as the number of countries where he or she has interacted with Facebook applications. As a consequence, when looking at the increase in MAUs through time it may not be possible to discern how much of the change is to be attributed to Ukrainians merely transiting the country and how much to Ukrainians actually settling in. For the same reason, insights on outflows may not be immediately visible, as the effect on the multiple counts would take some time to fade out. However, to the best of the author’s knowledge it is not clear if the estimation of the target audience provided by the Facebook’s advertising platform are already corrected for this bias or not.
UNHCR data also have some caveats. First, they represent the arrivals (i.e. inflow) of people fleeing Ukraine towards neighboring countries, not the actual number of people displaced in a country at a given time. Second, the right to move freely within the Schengen area means there are very few border controls. The data of arrivals in Schengen countries (Hungary, Poland, Slovakia) bordering Ukraine therefore only represents border crossings into that country, but UNHCR estimates that a large number of people have moved onwards to other countries. Nevertheless, these figures represent the only tried and tested publicly available information as of the time of writing, and we compare them with our data to check if we find a similarity in the trends.
4.1 Data collection
An automated script developed at the JRC’s Knowledge Centre on Migration and Demography (KCMD)Footnote 12 collects data on a weekly basis by making requests to the Facebook’s Marketing Application Programming Interface. The same script has already been employed in Spyratos et al. [31]. Using the API it is not possible to query historical data. For this reason, every time the script runs it stores the response of each query in a database, allowing us to have a time series of the data.
Our script makes requests to the Marketing API to retrieve data on the estimated number of people that satisfy a set of characteristics, as described in the documentation at Meta for Developers website.Footnote 13 For this study, these are the country of residence and the language of the users, which can be requested by setting the proper parameters under the targeting_spec field of the delivery_estimate endpoint.Footnote 14
An example of a query looks like the following:
In the above example, we request an audience estimate of users based in Italy and who speak Ukrainian. By setting the optimizationz_goal field to REACH, we ensure the ad set is optimized to reach the most unique users of each day. In other words, this is set to serve the maximum number of people. The locales field allows to specify the language of the user. Here, 52 corresponds to the “Ukrainian” language.Footnote 15
The response of the API looks like the following (only the interesting data are shown):
The result refers to the time the query was sent (estimate_dau) and the 30 days prior to it in the case of the estimate_mau* fields. estimate_mau_lower_bound and estimate_mau_upper_bound represent the lower and upper bounds of the estimated number of people that have been active on the selected platforms and satisfy the targeting spec in the past month respectively.Footnote 16 By not restricting the data collection to users of one or more specific target application such as Facebook or Instagram, we are implicitly selecting all possible Facebook’s platforms and services, thus covering the largest possible number of users that meet our chosen criteria.
Finally, the value of MAUs we use throughout this work is the average between the lower bound and the upper bound estimates coming from the API.