In search of art: rapid estimates of gallery and museum visits using Google Trends

Measuring collective human behaviour has traditionally been a time-consuming and expensive process, impairing the speed at which data can be made available to decision makers in policy. Can data generated through widespread use of online services help provide faster insights? Here, we consider an example relating to policymaking for culture and the arts: publicly funded museums and galleries in the UK. We show that data on Google searches for museums and galleries can be used to generate estimates of their visitor numbers. Crucially, we find that these estimates can be generated faster than traditional measurements, thus offering policymakers early insights into changes in cultural participation supported by public funds. Our findings provide further evidence that data on our use of online services can help generate timely indicators of changes in society, so that decision makers can focus on the present rather than the past.

In contrast to data from surveys, data from many of these newer sources is available with little to no delay [5,29]. A key example is data generated through searches for information online, using search engines such as Google [5,[16][17][18][19][20][22][23][24][25][26][27][28][29]. In this paper, we investigate whether search data can provide behavioural insights that are of value for policymakers in the domain of culture and the arts.
In the UK, access to national museums such as the British Museum, the Tate Modern, the Natural History Museum and the V&A has been free since 2001. Instead of relying on entrance fees for their permanent exhibitions, these museums and art galleries receive gov-ernmental funding from the Department for Digital, Culture, Media, and Sport (DCMS). The goal of this policy was to boost participation in cultural activities. Ongoing evaluation of whether this policy has been successful requires continuous data, to verify that visitor numbers are still high and to alert policymakers to problems with their investments if not.
In recent years, DCMS has made data on visitor numbers available with at least a month's delay. Key policymakers receive these figures a maximum of twenty-four hours ahead of their official release. Here, we hypothesise that people who wish to visit a museum or gallery may also be likely to search on Google for information about the museum or gallery around the time of their visit. As data on the collective volume of Google searches for a given term or topics is made available publicly with near to no delay, we seek to determine whether this data might allow us to generate much faster indicators of visitor numbers, thus giving policymakers early insights into the performance of their museums and galleries. In our analyses, we aim to exploit the rapid availability of data on online searches, while bearing in mind the problems that previous research has shown can arise if this data is not treated with appropriate caution and consideration [19,20,27].

Data
We retrieve data on the numbers of visitors to museums and galleries sponsored by DCMS. This dataset is released through the official DCMS website and is freely accessible [38]. Until Spring 2019, the dataset was updated monthly on the first Thursday of each month. The dataset contains the monthly number of visitors to the museums and galleries, with data available from April 2004 onwards. Each museum and gallery records the number of visits to their site in a variety of ways, such as using sensors on the doors. The museums then provide these figures to DCMS. The Department releases the visitor numbers as official statistics on a monthly basis with a delay of one month (or more recently, one quarter). For example, figures released at the beginning of February 2019 reported on visits in December 2018. It is important to note that the official figures are occasionally subject to changes, due to further analysis performed by DCMS, or to correct for previous errors. This means that figures on the number of visits in December 2018, first released in February 2019, could be updated in new releases by DCMS even after February 2019.
We also obtain monthly time series reflecting the level of interest in each museum or gallery on Google, using the Google Trends service. Google Trends offers data on the volume of Google searches for a specific search term, such as Tate Modern, or, if preferred, for a related topic. While data on searches for a search term would represent the volume of queries for the exact string Tate Modern, data on searches for the corresponding topic may reflect searches for a variety of terms related to the Tate Modern, such as the names of artists exhibiting their work there. To cover this broader range of searches, here we retrieve search data for the topics relating to the museums and galleries in our analysis.
We restrict our Google Trends request to data on searches made in the United Kingdom. We retrieve data from January 2004 onwards, the earliest date for which Google search data is available. The Google Trends interface provides data for time periods of this length at monthly granularity. We request Google Trends data for each museum or gallery individually. Google Trends data is normalised, such that the highest search volume for each museum topic in the time period specified is represented as 100, and all other data points are scaled to integer values up to 100. A Google Trends value of 100 may therefore represent a different level of search volume for different museums. For each museum, the final Google Trends data we retrieve provides an indication of changes in search volume for the museum over time.
We retrieve visitor number and Google Trends data on 16 DCMS-sponsored museum and gallery institutions or groups. For museum groups which have several museums in the UK, such as the Science Museum group, we consider the total number of visitors to the group of museums, and retrieve Google Trends data for the topic relating to the name of the group (here, Science Museum). In the Additional file 1, we show that we find similar results if we change this approach and only consider visitor numbers at the main Science Museum site in South Kensington, London (Fig. S2).
We make one exception to this approach for the Tate Galleries group, which is the largest group in terms of visitors, and for which there is no obvious Google Trends topic. There are, however, topics for each of the galleries that form part of the Tate group (the Tate Britain, the Tate Modern, the Tate Liverpool and the Tate St Ives), and therefore we include these galleries in our analysis as separate entities.
Our analysis does not include a few sites sponsored by DCMS. The Geffrye Museum closed down for refurbishment in January 2018, and severely reduced its visitor numbers during preparations for refurbishment; the Tyne and Wear Museums group does not have a dedicated Google Trends topic; and the Museum of London and Museum of London Docklands stopped being sponsored by DCMS in 2008, such that visitor numbers are no longer collated by the Department. In the first column of Table 1, we provide the complete list of museums and galleries considered in our analysis. Figure 1 (top row) depicts a comparison between the monthly number of visitors and volume of Google searches between January 2010 and December 2018 for a subset of three well-known museums which are part of our analysis: the Tate Modern, the National Portrait Gallery and the Science Museum, where for the latter we consider visits to the full museum group. An initial correlation analysis suggests that months ranked higher in terms of Google search volume for a museum tend to also be months ranked higher in terms of visitor numbers (Tate Modern: Kendall s τ = 0.429, N = 108, z = 6.442, p < 0.001; National Portrait Gallery: Kendall s τ = 0.540, N = 108, z = 8.130, p < 0.001; Science Museum group: Kendall s τ = 0.231, N = 108, z = 3.482, p < 0.001). We note that for all three museums examined here, the highest search volume index of 100 occurs between January 2004 and December 2009, and so is not depicted in this figure. This may be due to searches for these museums accounting for a larger proportion of all UK Google searches in Google's earlier days. In the Additional file 1 (Table S1), we report similar correlation analyses for all 16 museums and galleries. With the exception of the Wallace Collection and the Royal Armouries, we find that this correlation result holds across our sample.

Methods
We aim to generate estimates of the number of visitors to a given museum or gallery in month t at the beginning of month t + 1. With the release timetable for the museum and gallery official statistics that was in place until Spring 2019, this would anticipate official estimates by a month. For this task, we use the adaptive nowcasting approach, originally introduced by Preis and Moat [19] to improve monitoring of flu cases using Google Trends data. Within the adaptive nowcasting framework, we consider two families of forecasting models suitable for use with time series data: autoregressive integrated moving average (ARIMA) models, as used by Preis and Moat [19], and neural network autoregressive Figure 1 Rapid estimates of museum and gallery visitor numbers using Google search data. Top row: We investigate whether we can generate rapid estimates of visitor numbers for a range of museums and galleries in the UK using data on how frequently a museum has been searched for on Google. Here, we illustrate our findings using three example museums: the Tate Modern, the National Portrait Gallery and the Science Museum group. We compare visitor numbers provided by the Department for Digital, Culture, Media, and Sport, and data on the volume of searches for each museum on Google between January 2010 and December 2018. We find that more Google searches tend to correspond to more visits to the museum (Tate Modern: Kendall s τ = 0.429, N = 108, z = 6.442, p < 0.001; National Portrait Gallery: Kendall s τ = 0.540, N = 108, z = 8.130, p < 0.001; Science Museum group: Kendall s τ = 0.231, N = 108, z = 3.482, p < 0.001). Second row: For each museum, we build a baseline autoregressive integrated moving average (ARIMA) adaptive nowcasting [19] model using historic visitor numbers. We compare this baseline to an adaptive nowcasting model that includes data from Google Trends as an additional predictor. We generate out-of-sample monthly estimates for a period of nine years between January 2010 and December 2018, re-training the model on a rolling training window of the most recent 60 months for every estimate. We observe that models including Google data tend to exhibit a smaller absolute percentage error compared to models based on historic visitor numbers alone (see also Table 1). Third row: We assess the impact of varying the training window between 30 months and 72 months. We generate monthly estimates for the period between January 2011 and December 2018, for both the baseline model and the model including Google Trends data, for all training window lengths. For each model, training window, and museum, we calculate the mean absolute scaled error (MASE). The definition of the MASE specifies that a naive model using the visitor numbers from 12 months ago as its estimate would score a MASE of 1. Visual inspection reveals that regardless of training window size, the Google Trends model tends to generate better estimates than both the baseline ARIMA and a naive model, as reflected by lower MASE values. Fourth row: We also compare the performance of a baseline adaptive nowcasting neural network autoregressive model (NNAR), and an adaptive nowcasting NNAR enhanced with Google Trends data. Again, visual inspection reveals that the Google Trends model generates better estimates regardless of training window size. Overall however, the ARIMA models perform better than the NNAR models (NNAR) models. In both cases, we test the performance of our models out-of-sample: that is, using data that we did not train the model on.
We first investigate the performance of the adaptive nowcasting approach when based on ARIMA models [19]. We use automatic model selection to fit the parameters of the ARIMA models [39], which for example include the number of lagged values of the time series that are used in the estimates (see [40] for further details). The ARIMA models we build are seasonal, so that information on visitor numbers during the month of interest one year earlier can also be used to inform estimates.
We begin by building a baseline model that uses historical visitor numbers alone. This model will provide a benchmark of the quality of visitor number estimates that could be expected without using any additional data from Google Trends [19,20,27]. We train the baseline model on historical visitor numbers for a given museum over the previous 60 months. For instance, if we wish to estimate the number of visitors to the Tate Modern in January 2018, we train the ARIMA model on visitor numbers for the Tate Modern from January 2013 until December 2017. We can then generate an estimate of the number of visitors to the Tate Modern in January 2018. In other words, we build a one-step-ahead forecasting model training on a sliding window of 60 months. We refer to this model as the ARIMA baseline model. We return to investigate the importance of the length of the sliding window in the ARIMA baseline model subsequently.
Following Preis and Moat [19], we also build enhanced ARIMA models in which we add data on the volume of Google searches for the museum as a predictor. We first train the model using both historical visitor numbers and additional monthly data on the volume of search queries for the museum over the previous 60 months. We then draw on search volume data for month t, which is available at the beginning of month t + 1, to help generate a rapid estimate of the number of visitors in month t. For instance, to generate an estimate of the number of visitors to the Tate Modern in January 2018, we first train a model using data on the number of visitors to the Tate Modern and data on the volume of Google searches for the Tate Modern from January 2013 until December 2017. We then generate an estimate which draws on both the visits data for months previous to January 2018 and the volume of Google searches for the Tate Modern in January 2018. We hypothesise that the greater recency of the Google data will allow us to generate more accurate estimates of visitor numbers for that month, in comparison to generating estimates based on historical visitor numbers alone. By retraining our model for each estimate, we are able to take into account that the relationship between how frequently people search for a museum and how frequently people visit the museum may change over time. For example, the arrival of an exhibition at a given gallery may prompt a large surge in Google searches but proportionally fewer visits, or vice versa. The adaptive nowcasting approach allows us to update our understanding of the relationship between search behaviour and visits as soon as new data comes in. In this way, we can avoid the problems that have previously been observed when models of the relationship between search behaviour and offline behaviour are only trained once and gradually become out-of-date [19,20].
ARIMA models are widely used, but many other approaches to modelling time series exist. We therefore also consider a second family of time series models, neural network autoregressive (NNAR) models [41][42][43][44], which we plug into the adaptive nowcasting framework. In NNAR models, lagged values of the time series are used as inputs to the neural network. We consider neural networks with one hidden layer, where lagged values of the time series are combined in the hidden layer and then modified with a nonlinear transformation before the time series estimate is output via the output layer node. To add online data to an NNAR model, we add an additional input node to the neural network. We describe the NNAR models in more detail in the Additional file 1 (Fig. S1). As with the Table 1 Estimating museum visitor numbers using Google search data. We aim to generate rapid estimates of the monthly number of visitors at a given museum using adaptive nowcasting [19] with minimum delay as soon as each month is over. We build adaptive nowcasting models using autoregressive integrated moving average (ARIMA) models, and neural network autoregression (NNAR) models. For both families of time series models, we construct a baseline model using historical visitor numbers only. We also construct a Google Trends model, in which data on the volume of Google searches for a museum is used as an additional predictor. All models reported here are trained using the previous 60 months of data. We generate monthly estimates for the period between January 2010 and December 2018, and report the mean absolute percentage error (MAPE). For each comparison between a baseline model and an equivalent Google Trends model, the smallest MAPE is highlighted in bold. We observe that nearly all models which include data derived from Google Trends exhibit a smaller MAPE than their counterpart baseline model based on historical visitor numbers alone ARIMA model adaptive nowcasts, we retrain the neural networks at each time step using a sliding training window, before generating each estimate. DCMS data is available from April 2004 onwards. So that we can work with a full calendar year of DCMS data whilst leaving space for a training window of five years (60 months), we start training our models from January 2005 and generate estimates from January 2010. For both the ARIMA and the neural network models, we produce monthly estimates for a nine year period from January 2010 until December 2018.

Results
In Table 1, we report the results of our analysis. Across all museums, we find that models including data from Google Trends exhibit a lower mean absolute percentage error (MAPE) than models based on historical visitor numbers alone, with the only exception being the ARIMA with Google Trends model for the National Gallery. We also note that in most cases the performance of ARIMA models is better than that of NNAR models. Figure 1 (second row) depicts how the absolute percentage error varies over time for the three example museums (the Tate Modern, the National Portrait Gallery and the Science Museum group) when using ARIMA models. Again, we observe that including data from Google Trends as a predictor tends to reduce the error in the estimates. In the Additional file 1, we depict the same results for the other 13 museums in our analysis (Figs. S3 to S5).
While our findings hold across nearly all of the museums considered, the improvement delivered by including Google Trends data does differ between museums. This could be for a variety of reasons, including the underlying volume of search queries for the museum; the extent to which the Google Trends topic truly captures searches relating to the museum or museum group; the extent to which people have reason to search for the museum other than when they visit; the extent to which people visit without searching for the museum, for example because the museum is in a popular tourist area; or measurement and sampling noise in either the official visitor numbers or Google search data.
We perform further validation of our results with the modified Diebold-Mariano test, which compares errors from time series models to check whether two different models exhibit a statistically significant difference in forecast accuracy [45,46]. This test can be used with a range of forecast error measures, but it has been shown that the mean absolute scaled error (MASE) satisfies all the required assumptions of the test, such as the asymptotic normality of the forecast errors, whereas other common measures such as the MAPE may not [47]. The MASE is a scale invariant measure of the accuracy of forecasts [48], making it possible to directly compare forecast errors for museums with very different visitor numbers. The MASE is also symmetric [48], such that it results in an equal penalty for underestimates and overestimates of the number of visitors.
The MASE compares the absolute error of a forecast with the error that would be expected from a naive forecast. For seasonal data, the naive forecast is that each value will be equal to the value observed one season ago. For monthly data with annual seasonality, this is therefore the value of the time series twelve months ago.
The MASE for seasonal time series is hence defined as: where e t is the forecast error, defined as the actual value Y t minus the value forecast by the model undergoing testing; m is the seasonal period, which is 12 for our analyses of monthly data with annual seasonality; and Y t-m is the naive forecast estimate. By definition, the naive seasonal forecast model would score a MASE of 1. Values of the MASE lower than 1 imply that the model undergoing testing performs better than the naive forecast model. We generate monthly estimates from January 2011 until December 2018 using the same procedure described above, varying the training window between 30 months and 72 months. Bearing in mind once again that we start training our models with data from January 2005, our analysis here generates estimates from January 2011 onwards, rather than January 2010, to allow us to explore how performance differs when we use a longer training window of six years (72 months). Figure 1 (third and fourth rows) depicts the value of the MASE for the three example museums in our analysis (the Tate Modern, the National Portrait Gallery and the Science Museum group). Again, visual inspection suggests that regardless of training window size, models including data from Google Trends tend to exhibit a lower MASE than baseline models based on historical visitor numbers alone. This holds both for ARIMA and NNAR models. In the Additional file 1, we provide similar illustrations of the results for the other 13 museums analysed here (Figs. S3, S4 and S5). While we find a greater boost to performance from Google Trends data for some museums and much worse performance for others, we tend to observe the same broad pattern across our sample of museums.
We then perform the Diebold-Mariano test as follows. For a given training window size, for each model, we calculate the MASE for estimates made across all museums. In our analysis, there are four different models: baseline ARIMA, ARIMA with Google Trends, baseline NNAR, and NNAR with Google Trends. To compare all four models with all other models, we therefore need to carry out six different pairwise comparisons. To correct for multiple hypothesis testing, we adjust the p-values returned by the Diebold-Mariano test using the false discovery rate correction [49]. Across all training windows, we find that models including data from Google Trends have a statistically significantly lower MASE compared to models based on historical visitor numbers alone. We report further details of this analysis in the Additional file 1 (Tables S2, S3 and S4).
To complement the Diebold-Mariano analysis, as a further check, we build a regression model of the mean absolute scaled errors to investigate whether the type of adaptive nowcasting model used is a key predictor of the size of the error once the museum, month and training window size are all taken into account. We fit a generalised linear model using a gamma distribution, a logarithmic link function and robust standard errors, with the model, museum, month and training window as predictors. With 4 different models, 16 museums, 96 months of data and 43 training window lengths, our regression model is fit on 264 192 observations in total.
Each independent variable enters the model as a categorical variable. For the model variable, the four categories correspond to the different models: ARIMA, NNAR, ARIMA with Google Trends, and NNAR with Google Trends. We use the baseline ARIMA model as our reference level in the regression.
We are particularly interested in the coefficients of the model dummy variables, since these indicate whether models using Google Trends data have lower errors than the baseline ARIMA model. The fitted coefficient of the model dummy variable corresponding to the ARIMA with Google Trends model is statistically significant and negative (-0.132, p < 0.001). Similarly, the coefficient for the NNAR with Google Trends model is also statistically significant and negative (-0.078, p < 0.001), whereas the coefficient for the NNAR model with no Google Trends data is statistically significant and positive (0.078, p < 0.01; for all regression results, see sample size information above). Both these results suggest that models which include Google Trends data result in smaller errors than their baseline counterparts, for ARIMA and NNAR models alike. More details of this analysis are presented in the Additional file 1 (Table S5).
Our analysis so far has shown that models which use data from Google Trends perform better than models based on historical visitor numbers alone. However, one question remains: must the Google Trends data relate to the museum or gallery in question, or would any data from Google Trends appear to improve our estimates? If Google Trends data from unrelated topics were to significantly improve our estimates, this would suggest that our findings might be the result of a spurious correlation between search data and visitor numbers data. To address this final question, we repeat our analysis using data from Google Trends for control topics with limited or no relation to museums and galleries.
For our control topics, we choose: England, Travel, Buckingham Palace, Hyde Park, London, United Kingdom, Holiday, and Color. Again, we restrict our Google Trends request to data on searches made in the United Kingdom. We generate estimates for all museums in our analysis for the time period between January 2011 and December 2018, and calculate the MASE. Figure 2 depicts our results for ARIMA models. For simplicity, we present the results averaged across all museums. We again see lower MASEs for estimates generated using data Figure 2 Verifying the value of data on Google searches for museums or galleries. Are estimates of visitor numbers improved only when the Google Trends data relates to the museum or gallery in question, or would any data from Google Trends improve our estimates, suggesting that our findings might reflect a spurious correlation? To address this question, we repeat our analysis using data from Google Trends for control topics with limited or no relation to museums and galleries: England, Travel, Buckingham Palace, Hyde Park, London, United Kingdom, Holiday, and Color. Again, we generate monthly estimates of visitor numbers for all museums using rolling training windows between 30 months and 72 months, for both a baseline ARIMA model and a model enhanced with Google Trends data. (A) For comparison, we first depict the results across all museums when the actual Google Trends topics for each museum are used (for example, the Tate Modern topic for the Tate Modern gallery). We observe that the mean absolute scaled error (MASE) is lower when Google Trends data is included, regardless of training window size. (B) We compare these findings to results when Google Trends data for our eight control topics is used. Here, we find that the Google Trends model does not perform better than the baseline. Visual inspection suggests that adding irrelevant Google Trends data to the model in fact slightly increases the MASE. In the context of our previous findings, this provides further evidence that data on search queries for a specific museum or gallery contains valuable information that can be used to improve rapid estimates of visitor numbers on Google searches for the museums and galleries in comparison to estimates generated using historical visitor numbers alone (Fig. 2(A)). In contrast, data on Google searches for unrelated topics makes near to no difference to estimates of visitor numbers when compared to estimates based on historical visitor numbers alone (Fig. 2(B)). In fact, visual inspection suggests that models that draw on Google Trends data on irrelevant control topics tend to perform a little worse than the baseline overall.
To verify that Google Trends data for control topics does not improve estimates of visitor numbers, we investigate the performance of both ARIMA and NNAR models. We again build a regression model of the mean absolute scaled errors, using a gamma distribution, a logarithmic link function and robust standard errors, with the model, museum, month and training window as predictors. We build one such regression model for each control topic. We find that the fitted coefficient of the model dummy variable for the ARIMA with Google Trends is positive for all control topics, and statistically significantly so in the vast majority of cases. For the NNAR with Google Trends model, the coefficient is statistically significantly larger than the coefficient for the NNAR baseline for two of the control topics (both differences > 0.01, both ps < 0.025), with no significant difference for the other six control topics (all absolute differences < 0.0007, all ps > 0.24). We report on this analysis in further detail in the Additional file 1 (Tables S6-S13).
Overall, the results therefore indicate that the MASE either remains roughly the same or increases when irrelevant Google Trends data is fed into the model. We conclude that trying to improve estimates of visitor numbers with data on Google searches for unrelated control topics does not work, and may make the estimates worse. In the light of our previous results, this provides further evidence that data on search queries for a specific museum or gallery truly does contain valuable information on the number of people visiting those sites.

Discussion
We have shown that publicly available data from Google Trends on our collective online interest in a museum can be used to generate rapid estimates of the number of visitors to that museum before the official visitor numbers are released. These results hold for a range of museums and galleries sponsored by the Department of Digital, Culture, Media and Sport (DCMS). In particular, we have seen that models which draw on Google search query data outperform models based on historical visitor numbers alone. We note that none of the models in this analysis rely solely upon Google search data. Our findings provide evidence that historic visitor numbers are also valuable in estimating recent visitor numbers, and so we do not suggest that this traditional data should be discarded when available [19,20,27]. Our use of an adaptive nowcasting approach [19], where the model is retrained as new data comes in, should also help reduce the impact of spikes in search queries during which the relationship between online interest in the museum and visitor numbers changes, for example due to an exhibition which receives widespread news coverage.
Our analysis of course has a variety of limitations. Historical data released by DCMS is subject to updates whenever an error is found in the data collection. Such errors in historical visitor numbers would impact our estimates too, although our approach would allow us to integrate revised data as it arrives to improve future estimates. Estimates generated for different museums exhibit varying levels of accuracy, with some museums performing better than others. Use of these estimates for individual museums would therefore require consideration of the level of accuracy required. We also note that the models could almost certainly be improved by drawing on further information sources, such as data on tourist visits to the UK or London, derived either from official data or potentially from other online sources such as online photographs [10][11][12]; or data on the number of visits to the museums' and galleries' own websites.
Our findings provide further evidence that rapidly available data on collective online behaviour can help generate faster insights into changes in society as they occur. Out-ofdate data impacts decision making across a wide range of policy areas. Reducing delays in data provision whilst managing the quality of the data provided is a challenge. However, we suggest that appropriately careful analyses of online data can help us work towards this goal, and thereby provide decision makers with a greater understanding of the present state of the world.