Public debate in the media matters: evidence from the European refugee crisis

In this paper, we take a novel approach to study the empirical relationship between public debate in the media and asylum acceptance rates in Europe from 2002–2016. In theory, an asylum seeker should experience the same likelihood of being granted refugee status from each of the 20 European countries we study. Yet, in practice, acceptance rates vary widely for nearly every asylum country of origin. We address this inconsistency with a data-driven approach by analyzing refugee-related news articles and data on asylum decisions across 20 Europe countries for more than 100 asylum seekers’ countries of origin. We find that: (i) public debate sentiment in the media is strongly associated with European countries’ diverging asylum practices, much more so than social, cultural or economic factors, and (ii) by combining different measures of public debate we can make out-of-sample predictions within 3% of true acceptance rates (on average). We conclude by discussing the practical implications of our findings for European asylum practices.


Introduction
The European refugee crisis-a term used by the United Nations High Commissioner for Refugees (UNHCR) and widely adopted by the media to describe the influx of asylum seekers into Europe in 2015-has played a prominent role in EU politics. For example, "Take back control" was a phrase used by Brexiteers to express the campaign goal of helping the UK take back control over immigration policies. Chancellor of Germany Angela Merkel was strongly criticized for easing the entry of asylum seekers into Germany. Sweden was noted for its positive attitude toward asylum seekers, accepting four refugees per one thousand citizens (the highest in Europe); Hungary, on the other hand, built a fence on its eastern boarder to mitigate the entrance of asylum seekers.
In theory, an asylum seeker should experience the same likelihood of being granted refugee status based on international laws. However, as the above examples suggest, the reality is that asylum practices have varied widely across European countries. There has indeed been considerable debate about how Europe has and should respond to the increased inflow of asylum seekers. But it remains unclear whether or not this public debate has indeed impacted national-level asylum practices in Europe, and whether this can account for the stark variation in actual asylum acceptance rates.
To study this question, we take a novel data-driven approach that incorporates a large database of refugee-related news articles and data on asylum decisions across 20 European countries for more than 100 asylum countries of origin. We draw the news article data from GDELT [1,2], which is one of the world's largest databases for news reporting on a broad spectrum of events. In particular, it includes coverage from a diverse set of newspapers for each European language. The scope and breadth of this database allow us to define various measures that capture different aspects of public debate in the media. The data on asylum decisions is drawn directly from the official EU database [3]. In addition, we enrich our statistical analysis by incorporating socio-economic and political controls that have been known to influence national-level asylum decisions [4,5]. The data cover approximately 55,000 observations of asylum acceptance rates, endowed with a rich set of controls and indexed according to time, European country and asylum country of origin. It should be noted that the distinct country-level perspective taken in this paper offers some advantages: (i) we have access to a considerable amount of data, (ii) we can take a Europe-wide perspective in our analysis, which allows us to compare, for example, the UK and Spain, and (iii) we can study the influence of public debate while controlling for key country-specific socio-economic covariates and important explanatory factors such as refugees' countries of origin. These are novel features of our data-driven approach, and we are the first, to the best of our knowledge, to combine a big dataset of refugee-related news articles with official EU asylum statistics.
We approach our research question from three specific perspectives. First, we study the extent to which public debate can predict European asylum practices. Second, we test whether public debate is a better predictor for political change than other mechanisms. Finally, we analyze the causal structure between public debate and European asylum practices.
Our analysis reveals four key empirical insights. First, we show that public debate in the media on refugee-related topics has indeed varied widely across Europe, and this variation can explain the substantial variance observed in asylum acceptance practices. When we look at different measures, we find that the overall sentiment of public debate is what influences asylum practices: negative/positive media sentiment associates with lower/higher acceptance rates. Other measures, such as the volume of refugee-related news coverage, i.e., media attention, are less important. Second, by combining different measures of public debate, we can make out-of-sample predictions within 3% of true European acceptance rates (on average). Third, we show that public debate in the media is a better predictor of national-level asylum practices than social, cultural and economic factors captured in our dataset. Finally, by looking at the causal structure of public debate and asylum practices, we find that public debate strongly influences asylum acceptance rates while the reverse effect is statistically negligible. Taken together, our findings thus highlight the prominent role that public debate in the media plays in national-level policy practices.
The remainder of the paper is structured as follows. In Sect. 2, we provide further background on European asylum practices and review the relevant literature. In Sect. 3, we discuss the datasets underlying our analyses. In Sect. 4, we test the relationship between public debate in the media and European asylum acceptance rates by making in-sample and out-of-sample predictions. In Sect. 5, we study the causal structure of public debate in the media and asylum acceptance rates via Granger-causality tests. In Sect. 6, we conclude by discussing avenues for future work.

Background and literature review
This section first provides some historical background and information on current asylum practices. We then give an overview of the relevant literature in order to provide context and to situate the contributions of this paper in the broader scholarship on this topic. Readers who are mainly interested in our data analysis may continue to Sect. 3.

Historical background
Following World War II, the international political landscape changed with the creation of intergovernmental organizations such as the United Nations and the International Monetary Fund [6,7]. These organizations, unprecedented at their time, were the first truly international governing bodies tasked with establishing peaceful coexistence of nations and promoting prosperity. Of all the accomplishments that followed, one of the most important was the Universal Declaration of Human Rights, which was adopted by the United Nations in December 1948 and formed the basis of subsequent international treaties, economic agreements, regional human rights, and national constitutions [8][9][10][11]. Article 14, in particular, commenced international refugee law as it established the right of any person to seek asylum from persecution in other countries. This article was later elaborated at the more well-known 1951 Geneva Convention, which further spelled out the rights of refugees and the responsibilities of countries that grant asylum [12,13] (while the original intention was to protect post-WWII European refugees, the 1967 Protocol expanded the Geneva Convention's scope to asylum seekers internationally).
Nearly seventy years later, the 1951 Geneva Convention continues to provide a political mandate for one of the most pressing refugee situations since WWII. Currently, over 20 countries are affected by conflicts with major internal displacement, resulting in a total of 67 million forcibly displaced people-almost one in every 115 humans [14]. Europe has received a particularly large influx of asylum seekers: protracted conflicts in North Africa, the Middle East, and recently Syria have resulted in millions of asylum seekers coming to Europe to seek refugee status (we plot the influx in Fig. 1; see also [15]). a The refugee situation has created a divisive political environment, particularly in Europe, where there is large disagreement about how much support can and should be offered to asylum seekers. b

Current asylum practices
According to the 1951 Geneva Convention, the general guideline for granting someone asylum is establishing that he or she "is unable or unwilling to return to their country of origin owing to a well-founded fear of being persecuted for reasons of race, religion, nationality, membership of a particular social group, or political opinion, " (see, e.g., [16]). This means that, in theory, asylum seekers from a common country of origin should experience no heterogeneity in the likelihood of being granted 'refugee' status regardless of the country to which they apply.
Yet, in practice, asylum acceptance rates have varied widely across Europe. In Fig. 2, we plot the cross-sectional variation in asylum acceptance rates across the 20 European countries in our study. The starkly different asylum practices are readily apparent. c For example, acceptance rates for Afghanistan asylum seekers-who submitted the most asylum applications from 2002-16-ranged from 15 to 75% across Europe. From 2002-2007, the conflict in the Democratic Republic of the Congo prompted more than 60,000 of its citizens to apply for asylum in Europe, and between 10 and 75% of applications were accepted   across Europe. More recently, acceptance rates for the massive influx of Syrian citizens requesting asylum ranged from 27 to 70%. In Fig. 3, we show that this variation is also evident if we look at the time-evolution of asylum acceptance rates: for the same asylum country of origin, month-to-month acceptance rates can drastically change for different European countries.
How is it possible that, despite the 1951 Geneva Convention, asylum acceptance rates across Europe differ so significantly? One source of this variation is the legal ambiguity of keywords in the definition of a refugee, such as "well-founded" and "persecuted". These words give countries leeway in deciding whether the merit of an asylum seeker's claim matches the country's interpretation of refugee [5,17], which ultimately gives rise to each country's acceptance policies and practices. However, this continues to beg the question: why, then, are countries' legal interpretations of a "refugee" so different across Europe?
These observations are particularly interesting when seen in light of a recent large-scale survey conducted by Bansak et al. [18], who asked 18,000 general citizens across 15 European countries to 'mock' review a total of 180,000 asylum applications. The survey found that if European citizens reviewed asylum applications-rather than civil servants who process asylum applications based on mandates from governmental authorities-the differences in asylum acceptance rates would be insignificant compared to those observed in practice: "despite the major differences between the countries, there is a considerable consensus in terms of not only the types but also the overall number of asylum seekers that should be admitted, " ([18], p. 219). The surveys revealed that asylum applicants with high vocation prospects, consistent oppression stories, and who are Christian rather than Muslim received the highest public support.
For all European countries in our study, there thus appears to exist both the same legal mandate and a Europe-wide citizen consensus toward refugees. However, national-level asylum practices have varied widely across Europe. Empirically studying this dichotomy is the main motivation of this paper.

Prior work on the role of public debate in the media
In this paper, we turn to an explanation that has yet to be explored in the research literature: the extent to which public debate in the media shaped national-level asylum practices. The media, sometimes called "The Fourth Branch of Government" [19][20][21], is an institution often viewed as providing a public check on the branches of government. Media is also often viewed as a forum in which public debate on key political issues can take place. Past research has identified different key drivers that shape public debate in the media, such as the media's own political agenda [22], media moguls [23], public opinion [24] and governments engendering public support for policies [25][26][27]. Beyond the idiosyncratic drivers of public debate in the media, past literature has identified various situations in which public debate in the media has taken an active, and often successful, role in shaping presidential agendas [28] and electoral outcomes [29]. However, we are not interested in how public debate in the media influences electoral outcomes per se; we are instead interested in the empirical link with actual policy practices.
It is worth noting that, in the literature, there is an important debate about conditions under which the media takes an active versus passive role in shaping political outcomes, which is sometimes respectively called media power versus media capture [30]. Because we study 20 European countries, which represent a broad spectrum of different political systems, we cannot take a definitive stance about active versus passive role of the media because it is likely to differ across countries and for each news media outlet. For the moment, we focus on the media as a mechanism that could play various roles, which is without loss of generality for much of the analysis. We take up this discussion again in more detail when we analyze each European country individually.
Why might we expect public debate in the media to have an impact on national-level asylum practices in Europe? Several reasons from past literature point us in this direction. The first is that the European refugee crisis has dominated public debate in the media for the past several years, featuring diverse and often highly polarized opinions, and empirical research has found that politicians are sensitive to this kind of coverage. At one side of the spectrum, pro-refugee news articles have tried to garner support by getting readers to sympathize with humanitarian tragedies of failed treks across the Mediterranean Sea [31,32]. At the other side, articles opposed to refugees have tended to politicize their presence-particularly that of young men-by depicting them as "problems" and "invaders" [33]. The second reason is that if public debate in the media reflects citizens' preferences-whose votes might be needed in the next election-or media mogulswhose financial support might be needed in the next election-then political actors might have an incentive to respond to such debate accordingly. Finally, the third reason is that a country's asylum decisions have public and visible consequences, which can increase a political actor's sensitivity to public discontent expressed in the media [34]. d It should be noted that our hypothesis-namely that media coverage matters for the asylum seekers' fates-stands in stark contrast to the principles and goals of the 1951 Geneva Convention (see, e.g., [35] and [36]). A refugee claim is made at the individual-applicant level: ". . . any requirements . . . which the particular individual would have to fulfill for the enjoyment of the right in question, if he were not a refugee, must be fulfilled by him, " (1967 Protocol Relating to the Status of Refugees, Article 6). This claim must also be backed by reasonable evidence that, if this asylum seeker returned to his/her country of origin, then this person would be the target of persecution. e However, our hypothesis suggests that asylum decisions are not made vis-à-vis the individual but are influenced by perceptions of refugee-seeking groups as a whole [37]: "the individual asylum seeker who is escaping persecution is undermined by association with this ostensibly threatening collective, " ([35], p. 462). f In the same vein, [38] found that "threat", "others", "illegality", and "burden" are the four words most closely associated with asylum seekers in newspapers-if our hypothesis is correct, then such words can negatively impact an individual asylum seeker's claim for refugee status.
Perhaps one of the most important means of disseminating public debate in the news is social media, and scholars agree that such platforms have ". . . increasing social, economic and political importance" [39]. A number of studies have shown that individuals experience a sense of self-gratification by sharing news on social media [40][41][42]. In addition to suppliers of news, social media is the key means by which individuals consume news information, whether or not this is intentional or incidental [43][44][45][46]. While the debate on how social media affects news dissemination is still ongoing, one dynamic that has relevance for our study is that social media has been found to polarize those news that do end up in public debate. For example, a recent study in Finland found that social media over-emphasized news articles with crime and threat-oriented themes on refugee issues, while positive articles about refugees where under-emphasized [47].

Data
This study leverages two main data sources: (i) a big dataset of refugee-related articles and (ii) official European asylum statistics. We describe both in detail below.

Media data: GDELT
Data The first challenge is to represent public debate in the media empirically. To this end, our analysis utilizes a big data repertoire of news articles called the Global Database of Events, Language, and Tone (GDELT; [1,2]). g GDELT is a recent database that includes international and national news coverage from nearly every major online news source (we include a list in Additional file 1). GDELT has been shown to be more comprehensive than other well-known and utilized news article databases, such as ICEWS [48]. Furthermore, GDELT offers updated and sophisticated algorithms, including language translations, which are critical for our analysis. Below, we only highlight the aspects of GDELT that are most pertinent to our analysis; other relevant details are relegated to Additional file 1.
GDELT organizes news articles according to news events, which is crucial for enabling us to build measures capturing the characteristics of refugee-related news coverage. To clarify what this means, it is useful to consider an excerpt from a news article that is included in GDELT: The United Nations will provide nearly 25,000 tons of emergency food aid to refugees fleeing the civil war in Liberia, the World Food Program (WFP) said on Monday. " Importantly, GDELT performs what it calls a "principle-actor decomposition analysis". This means that it identifies (1) who is the main actor, (2) what action this actor is taking, and (3) who is the recipient of this action. Concretely, in the above sentence GDELT identified that (1) the United Nations (2) will provide food aid (3) to Liberian refugees. For notation, denote this event as {United Nations, will provide food aid, Liberian refugees}.
In fact, the principle-actor decomposition analysis is able to abstract beyond specific actors representing a certain country (or its government). This makes it uniquely suitable to construct our country-specific measures of public debate in the media. To showcase this, consider the following excerpt from a 2015 BBC article in the GDELT database: Example from a 2015 BBC article included in the GDELT database: h David Cameron announced on Monday that the UK will accept up to 20,000 refugees from camps surrounding Syria.
According to the GDELT algorithm, this event is first identified as {David Cameron, will accept, Syrian refugees}.
The GDELT algorithm then checks against a dynamically updating database that codifies the country associated with each event. Given that David Cameron is associated with the UK, the example above is codified as {UK, will accept, refugees}.
The event {UK, will accept, refugees} is then entered into the GDELT database with a date of 08/09/2015 as well as the exact time the article was found.
As mentioned before, the database is fundamentally structured around such news events. Specifically, GDELT codes all other articles that within a fifteen minute time window cover the same event, i.e., {UK, will accept, refugees} still as part of the same news event. From this it then builds two measures for this news event that are of particular interest to us: (1) the number of articles i covering this particular news event and (2) the average sentiment of these articles. j The average sentiment score is calculated using a standard dictionary-based sentiment algorithm [50,51]. While perhaps simplistic on the surface, this sentiment measure has been shown to be comparable to human-coded sentiment scores, and it performs comparable to many other state-of-the-art sentiment measures [52].
The data provided by GDELT thus enables us to build country-specific measures for public debate in the media based on their coverage of refugee related events. If we want to find a measure for the attention spent on refugee related issues in, e.g. France in March 2015, then we can parse GDELT for all events and count the number of times that articles mention refugee related events. Similar we can use the measures for the average sentiment of each news event recorded by GDELT to determine how sentiment of refugee-related media coverage has changed over time. We expand on both below when we describe the measures used for our main analyses.
In addition to the article-specific analyses, perhaps the greatest advantage of GDELT is its size and scope. It is known that major news outlets target news audiences [53]. News outlets also have idiosyncratic factors that influence its content, such as geographical reach, political ideology and profit incentives [29,54]. Our goal is to build empirical variables that capture the debate that results from all of these idiosyncratic features; without the comprehensive nature of GDELT, we could otherwise miss some major news outlets that play an important role. It is also known that both national and international news coverage can influence domestic public debate [55]. Again, GDELT gives us a way to capture both levels of news coverage.
We plot in Fig. 4 the volume of refugee-event related news coverage for Sweden and the United Kingdom resulting from our filter approach described above. As expected, the volume increases during the refugee crisis of 2014-16. However, the absolute volume may be skewed because of changes in the total number of news article published over time, changes in Google News filters, etc. Therefore, in our analyses we normalize our measure of media attention by always considering coverage of refugee-related relative to all events covered in GDELT for each country (see below for details).
Metrics on public debate in the media We define three general metrics that, on the one hand, leverage the scope of our data and, on the other hand, are subtle enough to detect relevant changes in public debate about refugees. To formally introduce our measures, let where a time period t will typically be a month or quarter (both are used in the analyses below). Note that these are events as captured by GDELT. For each event E it ∈ E it , let (1) N(E it ) denote the number of articles that mention event E it and (2) S(E it ) denote the average tone of all articles that mention event E it . It is also useful to define as the set of refugee and non-refugee related news events for country i (which means that E it ⊆ C it ). The set C it allows us to build measures with respect to a baseline.
As our first quantitative measure, we define media attention as the number of times that refugee-related events are mentioned in articles divided by the total number of events covered in GDELT: This measure is built for each country i and time period t. A similar metric is used in [28], who studied whether radio attention influences US presidential agendas (see also [56]). Our second and third quantitative measures focus on the content of the articles covering specific news events. We here leverage the GDELT sentiment scores described above and define two versions of this measure: (2) where (2) represents the average sentiment of all refugee-related coverage for country i at time period t, and (3) represents the average sentiment of refugee-related coverage relative to the average sentiment of all events (similar metrics can be found in [57].) Note that It is worth noting that our sentiment metrics benefit from the massive volume of articles captured in the GDLET dataset. Certainly, the sentiment measure of a single article is subject to noise and error. But by generating a sentiment measure from tens of thousands of articles, we average over such noise and acquire a robust 'averaged sentiment' metric.
Illustration of public debate in the media With these metrics, the large variation of public debate in the media on refugees across Europe is evident in our dataset. Figure 5 shows the average media sentiment and attention per year for each country in our study. In 2015-the year with the most asylum applications in Europe in the past 40 years-the refugee crisis dominated media coverage with five times more mentions in news articles than any preceding year; this is in contrast to 2003, when Europe also saw peaking numbers of refugees, but reporting remained relatively small. One can observe a similar trend in sentiment. In 2015, the sentiment of public debate in the media was clearly negative across Europe (see dark colors in Fig. 5), while sentiment varied greatly across Europe in 2003. Another perhaps striking observation is the different trends across Europe: while media sentiment and attention toward the refugee crisis in the UK remained relatively constant, countries such as Denmark, Estonia and Hungary experienced large shifts from year to year.
These metrics intentionally break-down the complexity of what constitutes 'public debate in the media' to provide rather simple indicators. Yet, as we show below, these metrics are sufficient to provide considerable predictive power of European asylum acceptance rates.

Asylum data
Empirical data on European asylum practices is drawn from the official EU database [3] from which we obtain each country's number of incoming, accepted, and rejected asylum applications by country of origin; the data is available monthly from 2002-07 and quarterly from 2008-16. k In total, our dataset consists of 20 European countries, more than 100 refugee sending countries, and 60 time period (quarters).
All asylum data in our study concern individuals who have formally submitted an application for international protection (or who have been included in such an application as a family member). It should be noted that our statistics do not include or concern refugees who did not submit asylum applications because (i) such statistics are not relevant for predicting or explaining country-specific asylum acceptance rates and (ii) no such data exist.
Asylum acceptance rate The dependent variable in our analysis is each European country's asylum acceptance rate. Below, we describe how the acceptance rate is defined.
From the EU database, we have information regarding the total number of asylum applications accepted each quarter (depending on the year). l According to the 1951 Geneva Convention, an asylum application is accepted if and only if the individual is "someone who is unable or unwilling to return to their country of origin owing to a well-founded fear of being persecuted for reasons of race, religion, nationality, membership of a particular social group, or political opinion" (see, e.g., [16]). m The EU database also has information regarding the total number of asylum applications rejected each quarter. Taken together, we thus know the fraction of asylum applications accepted vs. rejected, which we define as the asylum accepted rate: where i represents the European receiving country, j represents the asylum country of origin, and t represents the time period. Whenever Accepted ijt + Rejected ijt = 0, we drop this observation from our data.

Control variables
Refugee inflow In our analysis, it is important to control for the "floodgate" effect that has been pointed out in the literature [37,58]. This effect amounts to asylum acceptance rates decreasing (or increasing) because of the volume of applicants coming, owing to reasons such as physical and/or financial constraints of a country. We thus include in our analysis the total number of first-time asylum applicants, which are asylum seekers who apply for refugee status internationally for the first-time (e.g. an asylum seeker who was denied in Germany and re-applied in France is not considered a first-time applicant). This variable is indexed with respect to asylum country of origin, recipient European country, and time period.
Economic indicators There is empirical evidence that governments may increase or decrease acceptance rates based on labor demand, economic progress and stability [5]. Therefore, we include national GDP, unemployment rate, consumer price index, and government debt as control variables in our analysis (all from the official EU database; see Additional file 1 for more information). As detailed below, we include (i) a between-country version of these variables to account for structural differences in the economic and institutional capacity to accept applicants and (ii) a within-country version to account for the changing economic environments that governments face. We include two final measures to control for additional relevant dynamics that could confound our results.
Press freedom index The first accounts for the freedom with which public debate can take place in the media. In general, newspapers are not entirely free when it comes to what they can and cannot publish, especially pertaining to European refugee policies. Any censorship as such could bias our results, as it could take away from the true public debate that is happening in a country. Therefore, we include the "Press Freedom Index" developed by the organization Reporters Without Borders (https://rsf.org/en/). The measure is based on a questionnaire given to reporters around the world that asks about media independence, transparency, legislative infrastructure, et cetera-in line with what we want to control, this measure captures journalists' perception of press freedom.
Governmental ideology The second measure included in our analyses accounts for each European government's ideology. Based on a large dataset collected by the "Manifesto Project" (https://manifesto-project.wzb.eu) we know the number of parliamentary seats held by each political party in each European country in our study from 2002-16. In addition, this dataset includes a measure of left-right ideology for each party based on voting behavior. n We combine both pieces of information to build our measure for governmental ideology, which we define as the weighted average of party ideologies in a government with respect to the number of parliamentary seats held by each political party.

The predictive power of public debate in the media
The aim of this paper is to investigate the empirical relationship between public debate in the media and national asylum practices in Europe. If public debate in the media indeed influenced these decisions, we would expect at least three patterns to exist in our data. (i) Public debate in the media should be able to explain a considerable amount of variance in asylum acceptance rates across Europe, much more so than other control variables included in our data. (ii) Public debate should not only explain variance, but also accurately predict asylum acceptance rates. (iii) Finally, we would expect that public debate in the media at time (t -1) is strongly associated with asylum acceptance rates in time (t), but not the other way around. This third test is fundamentally a causality test and rules out the possibility of confounding factors that would make tests (i) and (ii) true. If this test fails and public debate and acceptance rates were mutually predictive, then we cannot say that public debate influences national asylum practices based on (i) and (ii).
In this section, we focus on the first two hypotheses and then separately test the causal structure between public debate and asylum acceptance rates in the next section. We proceed as follows. First, we describe our model for predicting asylum acceptance rates with our media measures and control variables. Second, we present our findings when we make in-sample predictions, which tests whether public debate can explain Europe-wide asylum acceptance rate variance in our data. Finally, we present our findings when we make out-of-sample predictions, which tests whether public debate in the media has predictive power.

Model setup
We use a mixed-effect regression design to test whether public debate in the media can explain/predict asylum acceptance rates (see, e.g. [59] for an overview of such models). The mixed-effect design allows us to (i) disentangle the time-invariant and time-dependent influence of variables and (ii) mix fixed and random effects in order to maximize statistical efficiency. Both design features are discussed below.

Defining time-invariant and time-dependent versions of all variables
Why is it important to disentangle time-invariant and time-dependent aspects of our variables? Consider media attention in the Czech Republic and Denmark as measured in Fig. 5 (i.e., the size of the bubble). These trends are clearly different: the Czech republic has roughly the same degree of media attention except for 2015, while Denmark shifts from high to low to high attention from 2002-15. Yet, the time-average is nearly the same, i.e., the average size of the bubbles for the Czech Republic and Denmark are roughly equal. When we study the influence of media attention on asylum acceptance rates, we do not want the Czech Republic and Denmark to be treated equally because they have the same time average. Instead, we somehow want to capture both time-invariant features between countries and timedependent features that captures country-specific trends, allowing us to differentiate the different dynamics occurring in the Czech Republic and Denmark.
We do so by making a distinction between a time-invariant average vs. a time-dependent trend. If S it is European country-i's public debate sentiment in quarter-t, then S i = 1 |T | t∈T S it is country i's mean sentiment from 2002-16. S i allows us to compare the influence of public debate across European countries (in the literature this is typically called a 'between-effect'). Similarly, (S it -S i ) represents the quarterly-change in public debate sentiment. The expression (S it -S i ) allows us to understand if shifts from positive to negative sentiment in public debate influence national-level asylum acceptance rates (in the literature this is typically called a 'within-effect'). We include similar time-variant and timedependent versions of all variables in our model.
Mixed-effect setup For the sake of notation, let R ijt represent the acceptance rate of European country i of asylum country j during quarter t. Let X ijt represent the corresponding vector of control variables (e.g. GDP, unemployment rates, and refugee inflow). Let S it represent public debate sentiment in the media in European country i during quarter t. Finally, let A it public debate attention in the media in European country i during quarter t.
The full mixed-effect model is given as follows: We introduced a logit transformation on (R ijt , S it , A it ) because R ijt , S it , and A it ∈ [0, 1] per definition (see Sect. 3). In (4), (α i , ψ j , φ t ) are dummy variables per European country (i), refugee country (j), and time period (t). β is a vector of parameters for our timedependent control variables, (X ijt -X ij ). β is a vector of parameters for our time-invariant control variables, X ij . (γ S , γ S , γ A , γ A ) are parameters governing the influence of public debate on acceptance rates. Finally, ε ijt is a normally distributed error term with mean 0 and variance σ 2 . One issue in (4) is statistical efficiency because of the fixed-effects, (α i , ψ j , φ t ). There are 20 European countries, 100 asylum seekers' countries of origin, and 60 time periods, which amounts to 180 separate parameters to estimate. With so many parameters, we lose statistical efficiency in estimating our model parameters and the ability to study our main variables of interest, (γ S , γ S , γ A , γ A ). We overcome this issue by instead modeling these parameters as random variables. This means that we let where we now only estimate six parameters, (μ α , μ ψ , μ φ , σ 2 α , σ 2 ψ , σ 2 φ ), rather than 180. The key assumption here is that (α i , ψ j , φ t ) are normally distributed. We revisit the validity of this assumption when we look at out-of-sample predictions in Sect. 4.3. Note that we estimate (4) and (5) with a maximum likelihood estimator.

Solving missing data problems: multiple imputations
One problem we must overcome is missing data, which is an issue in our setting. If our data are missing because of non-random reasons, then this 'non-randomness' can be a source of statistical bias [60]. This is an issue in our data because, for example, reporting refugeerelated data between 2002-2007 was not obligatory, hence some countries at certain timepoints did not report data.
We avoid such bias by utilizing multiple imputations, which has been proven more effective and more robust than the common listwise deletion method [61]. o The basic idea is to (i) estimate a representative distribution of our data, (ii) generate multiple 'full' versions of the dataset by randomly drawing from this distribution to fill in the missing observations, and then (iii) estimate (4) and (5) utilizing each version of our dataset and combine the results. In doing so, the final results take into account the uncertainty associated with missing data.
Previous studies have suggested that the number of imputations should be approximately equal to the percentage of incomplete observations in the data set (a rule first proposed by [62]). More recently, [61] and [60] proposed that the number of imputations should equal the average percentage of missing data in those columns with any missing data. In our case, the average percentage of missing data is 10.7%; we thus generate 11 imputations.
We combine the estimates of (4) and (5) from each imputation by using the standard 'Rubin combination rule' [63]. This means that: (i) the coefficients are combined using a simple average, (ii) standard errors are combined in a way that accounts for between-and within-variance of the estimated coefficients, and (iii) p-values are computed using the Barnard-Rubin corrected degrees of freedom [64]. We combine p-values by combining test estimates using Rubin's method and then conducting inferences as normal. p

Results: in-sample predictions
With our model at hand, our first goal is to test whether public debate in the media can explain the variance in national-level asylum acceptance rates we observe in Fig. 1(c). We thus estimate (4) and (5) using all of our data (i.e., via in-sample predictions). We are particularly interested in two aspects of our results: (i) the coefficients of our measures of public debate in the media and (ii) the comparison of R 2 and AICc values between models that use our measures of public debate versus our controls.
Regarding R 2 , there exists a large literature on the difficulty of assessing model fitness for mixed-effect models. For our purposes, we utilize two recent metrics proposed in the statistical literature [65,66]: (i) Marginal R 2 , describing the proportion of variance explained by the fixed factors alone, and (ii) Conditional R 2 , describing the proportion of variance explained by both fixed and random factors. In order to combine the Marginal and Conditional R 2 values for each imputation, we use a method proposed by [67], which involves transforming R 2 into a z-score, averaging, and then converting back to an R 2 value. When we assess model fitness, we also report AICc and BIC in order to understand the trade-off between explaining variance and adding more parameters in our model (this turns out to be important below). We report our empirical estimates in Table 1.

Does public debate in the media explain acceptance rate variance?
Our first question is perhaps the most important question: do our country-specific measures of public debate in the media explain the stark differences in national-level asylum acceptance rates across Europe?
There are four measures in Table 1 that suggest that they do. The first two are the Marginal and Conditional R 2 values reported toward the bottom of Table 1. Recall that the former represents the variance explained only by fixed-effects while the latter represents the variance explained by fixed-effects plus the random coefficients from (5). As is clear in the table, introducing a normally distributed set of coefficients to represent fixed effects provides an effective means of explaining asylum acceptance rates: Marginal R 2 values range from 0.01 to 0.06, while Conditional R 2 values range from 0.76 to 0.79.
The Conditional R 2 values in particular suggest that a considerable amount of variance can be explained by our media measures. With our measures described above, we can explain up to 76% of the variance of asylum acceptance rates, which is on par with the variance explained by all of our control variables (79%). This means that, rather than having to look at a disparate assortment of economic and political variables to understand national-level asylum trends, we can simply look at trends in public debate in the media to predict asylum acceptance rates.
The final two measures reported in Table 1, namely AICc and BIC, provide further evidence that our public debate measures are effective in explaining asylum acceptance rates. AICc and BIC provide two insights. First, AICc and BIC are defined as measuring the relative amount of information lost/gained about the true underlying process by comparing two models. Lower AICc/BIC values imply that more information is gained, i.e. a better model has been found. Second, AICc and BIC measure the trade-off between explaining data vs. the number of parameters in the model. This means that, if a model has a high R 2 value and a high AICc/BIC, then the data has likely been over-fitted.
With this in mind, the AICc and BIC values reported in Table 1 clearly point to the advantages of our public debate measures. The regression models with only media-related variables-including time-invariant and time-dependent measures of media attention and media sentiment-outperform any model that includes our control variables. Our media models suggests that the explanatory variables offer strong predictive power and justify additional parameters in the model, while adding economic and political controls adds explanatory power only by adding more parameters in the model and not from the variables themselves. These findings are reinforced when we do out-of-sample predictions in the next subsection, where the under-performance and over-fitting of models (3-6) is even more evident.
Having shown that mass media can explain the variance in our data, we next explore in detail which measures of public debate drive these results.
How does public debate in the media influence acceptance rates? To understand how public debate in the media influences acceptance rates, we zoom-in on the media models reported in columns (1) and (2) of Table 1. What is perhaps surprising is that, for both the time-invariant and time-dependent versions of our variables, public debate attention does not significantly contribute to explaining national-level asylum acceptance rate variance (in Table 1, note that the media attention coefficients are nearly zero as compared to the media sentiment coefficients). The interpretation is that the volume of public debate is not systematically associated with changes and trends in asylum acceptance rates. This supports previous literature, which has found that media attention is generally not sufficient to shape political change [68].
Instead, we find that the strongest and most significant model parameters are associated with public debate sentiment. The emphasis on sentiment versus attention supports previous mass media theory, which suggests that, "[b]ecause politics is the business of problem solving, negative news automatically turns all heads to politics expecting at least some form of policy reaction, " [69]. In model (1), we find that a 1-unit increase in the log-odds-ratio in public debate sentiment corresponds to a 0.462 increase in the log-odds-ratio in asylum acceptance rates. Model (5) reports an even higher coefficient of 1.186 (this is explained by the fact that public debate in the media is somewhat correlated with each country's Press Freedom Index, which means that model (1) compensates for this effect with a lower reported coefficient). To gain intuition on the magnitude of this effect, suppose a European country's asylum acceptance rate was 50%, and overall debate sentiment in the media was neutral with Sentiment refugee it = 0%. Then based on model (5), we estimate that a 5% decrease in quarterly media sentiment-i.e., a somewhat more negative leaning of public debate in the media-correlates with a 3% decrease in asylum acceptance rates. In 2015, this would have translated to a rejection of ∼30,000 asylum applications across Europe.

Results: out-of-sample predictions
Thus far, we have estimated our empirical model using (4) and (5) and all of our data. By doing so, we were able to understand how much variance could be explained with different model setups. In the end, we found strong evidence that our measures of public debate in the media explain considerable variance of asylum acceptance rates across Europe (>75%) and is a better predictor than economic and political variables (based on differences in AICc/BIC values between models 1, 2 and 2-6).
One interesting part of Table 1 is the stark change in AICc and BIC when we include economic and political predictors. The results suggest that this data does not provide additional information for the purposes of explaining asylum acceptance rates. However, with the considerable difference between models (1, 2) vs. models (3)(4)(5)(6), we want to further test whether our media models are indeed the best models as suggested by AICc and BIC. Table 2 Out-of-sample results. Comparison of models in Table 1 with respect to out-of-sample-predictions. Mean square error is estimated via randomly sampling 80% of the data to estimate the model, using the remaining 20% to test the model, and repeating 100 times. SD = standard deviation. The best out-of-sample prediction statistics are reported in bold Thus, in this section we test models (1)-(6) by making out-of-sample predictions. Doing so provides a strong indicator of over-fitting vs. under-fitting of models: over-and underfitted models tend to be poor at making out-of-sample predictions, while models that find a balance between the two perform well.
We follow a standard procedure called repeated random sub-sample validation. First, we randomly sample 80% of the data to fit our regression model (i.e., this is our training data). Second, we utilize this model to predict the values of the remaining 20% of the data (i.e., this is our testing data). Finally, we compare the predicted values vs. the true values via the mean absolute error (MAE)-which is the average distance between the predicted and true values-and the mean square error (MSE)-which is the average square distance between the predicted and true value. We repeat this procedure 100 times in order to report a mean and standard deviation of MAE and MSE for each model. We report values in Table 2.
The results in Table 2 support those from Table 1. Using our measures of public debate in the media, we are able to make out-of-sample predictions within 3% of true asylum acceptance rates (on average). This is in stark contrast to models based on economic and political predictions, which only manage to make predictions within 35% of true asylum acceptance rates. This supports the AICc and BIC values reported in Table 1 and our hypothesis that public debate in the media can explain and predict national-level asylum acceptance rates across Europe.

Granger-causality test
Our analysis thus far has focused on explaining and predicting asylum acceptance rates based on our measures of public debate in the media. Our results point to a strong association between the former and latter, so much so that we can predict national-level asylum acceptance rates across Europe with a 3% accuracy (on average). However, these analyses are fundamentally correlation studies that cannot say whether public debate in the media does or does not influence asylum acceptance rates. There are two reasons why. First, it could be that asylum acceptance rates influence public debate-if so, then the strong association is explained by public debate merely reflecting trends in national-level asylum practices. Second, there could be unobserved variables outside of our analysis that influence both asylum acceptance rates and public debate-if so, then the strong association is explained by a confounding variable problem and there may not exist any causal relation between asylum acceptance rates and public debate in the media. Both reasons do not pose an issue for explaining variance and making predictions, as in Sect. 4. But both reasons do pose an issue if we want to understand if and how public debate in the media influences national-level asylum acceptance rates.
In this section, we therefore employ Granger-causality tests [70] to study the causal dependencies between public debate in the media and asylum acceptance rates. The test is motivated by [28], who also utilized this test to study the causal relationship between radio media agendas and presidential agendas. Granger-causality tests are powerful insofar as they impose no a priori structural conditions on causal relations; instead, "[w]e merely ask the data to tell us . . . which, if any, parameter restrictions are appropriate" [28, p. 176]. In other words, it is the data that clarifies the causal relationship between variables.
In what follows, we first describe the data and the Granger-causality setup before presenting our empirical findings.

Data
Using the Granger-causality test requires us to utilize data with a short-time scale because, to test the causal relationship between two time series, we must exclude as many confounding variables as possible. We thus limit this analysis to the monthly data from GDELT and on asylum acceptance decisions that are available from 2002-07. We proceed assuming that GDP, unemployment rates and other potentially relevant variables affect country policy on somewhat longer time scales than months (this is reasonable assumption considering that countries report such metrics on a quarterly level). The focus on 2002-07 (rather than 2008-16) is due to data availability: asylum acceptance rates from the official EU database are available at a monthly level during this time period, while all asylum statistics after 2007 are available at the quarterly level. GDELT data is available at the monthly level.

Granger-causality test setup
Months are denoted as t, τ = 1, 2, . . . , T. The key relationship we are measuring is if public debate in the media during τ ∈ [tk, t -1] influences asylum acceptance rates at time t, where [tk, t -1] represents k-months before t. As in the main text, let R ijt represent European country i's acceptance rate of asylum seekers from country of origin j at time t. We then define Model 1 as where the interpretation is that autoregressive terms in preceding months [tk, t -1] are used to explain acceptance rates in month t.
Because we found strong evidence that public debate sentiment is a strong predictor of asylum acceptance rates in Sect. 4, we restrict our attention and scope of our Grangercausality tests to studying this case.
As above, S it denotes the relative sentiment of public debate for European country i in month t, as spelled out in (2). The key idea is to test whether the autoregressive terms from public debate sentiment contributes to Model 1 significantly. We thus define Model 2 as In the manner of [70] and [28], we say that public debate sentiment helps explain asylum acceptance rates if we find that Model 2 indeed significantly improves Model 1. Significance is determined by a Wald test comparing Model 1 and Model 2.
As in the robust regressions, we use multiple imputations to avoid any biases due to nonrandom reasons for missing data [60], which would otherwise be a problem (see Sect. 4.1.2 for details). We combine the Wald test statistics from the multiple imputed datasets as a D 2 statistic as in [71] (p. 239).
In order to establish Granger-causality, we apply the same test as above but in the reverse direction: specifically, we test whether R ijt autoregressive terms significantly contribute to explaining (S it -S i ). The major difference in writing down these equations is that N equations-one for each recipient country-are required while only one new variable was introduced to Model 2. To parallel this structure, we include the average asylum acceptance rate of a country as the additional variable (rather than 10 or 20 terms and their lags).
We run tests for each European country separately. We report results for the 10-and 20-highest sending refugee countries in Tables 3 and 4, respectively. As reported, we run Table 3 Granger-causality test results. Granger-causality test of interactions between asylum acceptance rates and public debate sentiment in the media. The 10 asylum seekers' countries of origin with the most decisions made (both accepted and rejected) are included in the regressions. We report Granger-causal testing for 1-month (1-lag) and 2-month (2-lag) lagged VAR models. Locations with NA result from monthly data not being available from the EU database. The D 2 statistic is computed as in [71] and is the result of combining Granger tests from multiply imputed datasets  the test for two cases: in the first we only include one lag term, or R ij(t-1) and (S i(t-1) -S i ), and in the second we include two lag terms. The latter test is more stringent than the former test, because adding more terms to Model 1 renders it more robust to additional explanatory terms, thereby making it more difficult to identify a significant D 2 test statistic when comparing Models 1 and 2.

Main results
Our main findings are surprisingly unambiguous. We find that public debate sentiment significantly contributes to explaining asylum acceptance rates in nearly every European country in our study. Conversely, we find essentially no statistical evidence that the reverse effect exists: in nearly every country in our study (with the exception of the UK), asylum acceptance rates do not help explain public debate sentiment. This trend is systematic in both Tables 3 and 4 (with the exception of the UK). The fact that nearly all countries exhibit a unilateral direction of causality is somewhat noteworthy from a Granger-causality standpoint. The standard interpretation of such a finding is that public debate sentiment Granger-causes asylum acceptance rates. q These results rule out the concerns that were spelled out at the beginning of this section. If there was an unobserved confounding variable influencing both public debate and asylum acceptance rates, then we would not have observed any significant results in Tables 3  and 4. This is because adding new terms to Model 2 would have added no new informa-tion to Model 1, as the autoregressive terms would have already contained the information from the driving confounding variable. On the other hand, if public debate was influenced by asylum acceptance rates but not the other way around, then we would have observed the opposite results from Tables 3 and 4. Instead, our results provide clear evidence that public debate in the media influences-and is not influenced by-national-level asylum acceptance rates. Taken together with Sect. 4, we have strong evidence that public debate is indeed being heard at the policy level.

Discussion and conclusion
Our study was motivated by the observation that-despite the 1951 Geneva Convention, several follow-up international conventions, and a Europe-wide consensus on citizen preferences towards migration (vis-à-vis measured in [18])-we observe significant nationallevel variation in asylum practices across Europe. One might expect that European countries would have converged on asylum practices, giving equal and judicious treatment of asylum applications in a unified way. However, real-world data suggests the opposite. For nearly every asylum-sending country, national-level asylum acceptance rates differ widely across Europe.
In this paper, we test the hypothesis that public debate in the media can explain much of the variation in asylum practices across Europe. Our data-driven analysis draws on a comprehensive collection of refugee-related news articles (GDELT) and data on 20 European countries, 100 asylum countries of origin, all spanning 2002-16. By taking a Europe-wide perspective, we incorporate many different types of media outlets, thereby many different possible mechanisms that could drive public debate in the media such as the media's own agenda, media moguls, interest groups, citizens' opinions, and governmental efforts to engender public support for policies. While it is beyond the scope of this study to adjudicate between different explanations for the divergence in public debate in the media we observe, our quantitative findings nevertheless suggest that refugee-related public debate in the media is highly indicative of variation in asylum practices.
We find three clear patterns in the data. First, we find that public debate in the media is strongly predictive of European asylum acceptance rate, accounting for nearly 80% of the variation. It turns out that changes in public debate sentiment explains most of this variation, much more so than social, economic and political variables captured in our dataset. Second, by combining different measures of public debate in the media, we can make outof-sample predictions within 3% of true asylum acceptance rates (on average). Third, going beyond correlation analyses, we study the causal relationship between public debate sentiment and asylum acceptance rates via Granger-causality tests with monthly-level data. This offers two advantages: (i) we can rule out many possible confounding factors, especially those that might induce spurious correlations with our quarterly-level analyses, and (ii) it gives us further evidence that public debate is not only associated with national-level asylum acceptance rates, but also contains distinct signals that seem to set a precedence for European asylum practices (we want to emphasize that this is not causal evidence per se, rather, it is statistical evidence that points in this direction).
How can we understand the strong predictive power of our media-based measurements, especially in light of the weak predictive power of structural socio-economic and political indicators? Past literature offers a few insights. For example, a study analyzing the US found that hostile policy-level debate toward immigrants is most likely to take place when communities undergo sudden demographic changes at the same time that news rhetoric politicizes immigration [56]. In other words, the study suggests that the news media offer some type of trigger that builds on sudden demographic stress in a community. Building on this idea, [72] found that German civil servants who are making the asylum applications decisions are more influenced by regional sentiment toward refugees than management directives. Taken together, these studies suggest that public debate in the mediainfluenced by sudden demographic stress and regional sentiment-can influence asylum acceptance rates directly via civil servants. This is one among several mechanisms that could theoretically explain our results; testing each is left for future work.
We pursued a comparative cross-national research design in order to gain a Europewide perspective on the empirical relationship between public debate in the media and asylum practices. This advantage, however, comes at the cost of not being able to adjudicate between country-specific mechanisms that drive public debate in the media. This opens up several interesting avenues for future work. Given data on party deliberations and country-level policy processes, for example, it should be possible to test whether the media indeed serves as a conduit for governments to promote specific asylum policies. In addition, it would be particularly interesting to study the role of social media platforms, for example, whether social media polarizes public debate about issues such as asylum practices and, therefore, contributes to shaping the asylum policy-making process. Another avenue is studying the effect of international versus national public debate on refugees, as well as the effect of different media outlets (such as the New York Times versus CNN) [73]. Finally, another interesting avenue would be examining more detailed data on refugee inflow in order to study the extent to which heterogeneity of refugee profiles across Europe has shaped media coverage indirectly. It is likely that wealthy, educated refugees have better prospects of integrating into society more easily, and perhaps less noticeably, than refugees coming from other backgrounds. If public debate in the media is spurred by social tensions (as suggested by past literature), then European countries that received the latter profile of refugees might have exhibited stronger (and more negative) debates during the European refugee crisis than countries that received more refugees of the former profile. As such, heterogeneity in refugee profiles across Europe could also contribute to the patterns we observe in our study.

Additional file 1.
In the additional file, we (i) describe the GDELT dataset in further detail, and (ii) provide additional information on the EU asylum data and control variables used in this study. (PDF 145 kB)