Skip to main content
Figure 1 | EPJ Data Science

Figure 1

From: In search of art: rapid estimates of gallery and museum visits using Google Trends

Figure 1

Rapid estimates of museum and gallery visitor numbers using Google search data. Top row: We investigate whether we can generate rapid estimates of visitor numbers for a range of museums and galleries in the UK using data on how frequently a museum has been searched for on Google. Here, we illustrate our findings using three example museums: the Tate Modern, the National Portrait Gallery and the Science Museum group. We compare visitor numbers provided by the Department for Digital, Culture, Media, and Sport, and data on the volume of searches for each museum on Google between January 2010 and December 2018. We find that more Google searches tend to correspond to more visits to the museum (Tate Modern: \(\mathit{Kendall's}~\tau = 0.429\), \(N = 108\), \(z = 6.442\), \(p<0.001\); National Portrait Gallery: \(\mathit{Kendall's}~\tau = 0.540\), \(N = 108\), \(z = 8.130\), \(p<0.001\); Science Museum group: \(\mathit{Kendall's}~\tau = 0.231\), \(N = 108\), \(z = 3.482\), \(p<0.001\)). Second row: For each museum, we build a baseline autoregressive integrated moving average (ARIMA) adaptive nowcasting [19] model using historic visitor numbers. We compare this baseline to an adaptive nowcasting model that includes data from Google Trends as an additional predictor. We generate out-of-sample monthly estimates for a period of nine years between January 2010 and December 2018, re-training the model on a rolling training window of the most recent 60 months for every estimate. We observe that models including Google data tend to exhibit a smaller absolute percentage error compared to models based on historic visitor numbers alone (see also Table 1). Third row: We assess the impact of varying the training window between 30 months and 72 months. We generate monthly estimates for the period between January 2011 and December 2018, for both the baseline model and the model including Google Trends data, for all training window lengths. For each model, training window, and museum, we calculate the mean absolute scaled error (MASE). The definition of the MASE specifies that a naive model using the visitor numbers from 12 months ago as its estimate would score a MASE of 1. Visual inspection reveals that regardless of training window size, the Google Trends model tends to generate better estimates than both the baseline ARIMA and a naive model, as reflected by lower MASE values. Fourth row: We also compare the performance of a baseline adaptive nowcasting neural network autoregressive model (NNAR), and an adaptive nowcasting NNAR enhanced with Google Trends data. Again, visual inspection reveals that the Google Trends model generates better estimates regardless of training window size. Overall however, the ARIMA models perform better than the NNAR models

Back to article page