Predicting human mobility through the assimilation of social media traces into mobility models
- Mariano G Beiró^{1}Email author,
- André Panisson^{1},
- Michele Tizzoni^{1} and
- Ciro Cattuto^{1}
DOI: 10.1140/epjds/s13688-016-0092-2
© Beiró et al. 2016
Received: 17 May 2016
Accepted: 13 October 2016
Published: 21 October 2016
Abstract
Predicting human mobility flows at different spatial scales is challenged by the heterogeneity of individual trajectories and the multi-scale nature of transportation networks. As vast amounts of digital traces of human behaviour become available, an opportunity arises to improve mobility models by integrating into them proxy data on mobility collected by a variety of digital platforms and location-aware services. Here we propose a hybrid model of human mobility that integrates a large-scale publicly available dataset from a popular photo-sharing system with the classical gravity model, under a stacked regression procedure. We validate the performance and generalizability of our approach using two ground-truth datasets on air travel and daily commuting in the United States: using two different cross-validation schemes we show that the hybrid model affords enhanced mobility prediction at both spatial scales.
Keywords
human mobility machine learning predictive models geolocalized data1 Introduction
Modelling and understanding human mobility patterns at different spatial scales and aggregation levels - from single individuals to population displacements - is an important research topic because of a vast number of applications, ranging from urban and transportation planning [1, 2] and resource allocation [3, 4] to the prediction of migration flows [5, 6] and epidemic spreading at local, regional, or worldwide level [7–10].
In the last few years, a significant research effort has been made to understand human mobility patterns, both in the laws governing individual human trajectories [11, 12] and collective movements [6, 13, 14]. In the latter case, the most extensively used models are the gravity model [15, 16] and the more recent radiation model [6]. The gravity model assumes that the number of people travelling between two locations is directly proportional to some power of their population size, and decays as some power of the distance between them. Instead, the radiation model considers human movements as diffusion processes that depend on the population distribution over the space, reproducing Stouffer’s theory of intervening opportunities [17]. Both models are static and require some information in order to be adjusted: in the gravity model, parameters are fitted using real mobility data, provided by an independent source; the radiation model, in its original formulation, is parameter-free, but it requires accurate knowledge of the spatial population distribution. Both modelling approaches have been extensively tested, showing advantages and limitations. The gravity model has been successfully used to describe highway flows [18], air-travel [19, 20], commuting [8], and mobile phone calls between cities [21]. However, it has some relevant limitations, as the availability of data for calibration and the lack of a first principle derivation [6, 22]. On the other hand, the radiation model offers very good predictions for commuting patterns between US counties using only population data, but its applicability at different spatial scales has been debated since it does not succeed in capturing commuting inside urban or metropolitan areas [22–24] and it has never been used to model long distance travel patterns either.
The limitations of these models suggest that the quality of their results can be largely improved if they are supported by additional data [24, 25]. In fact, several works have analyzed records from mobile phone companies to study individual [11] and collective mobility [26–28], showing that it is possible to infer these flows from human activity. The mobility flows obtained in this way can be successfully used for the prediction of epidemic spreading [29, 30], as a proxy for the real, often inaccessible, mobility data.
In this context, the large volumes of digital traces left by humans over the Internet allow for a better understanding of mobility processes, with immediate benefits. On the one hand, the growth of the transport infrastructure and the fast evolution of mobility patterns call for models that are informed by real-time data sources. People travel more, and travel patterns may change very fast, with important consequences for epidemic spreading and planning. A timely modelling of mobility processes might then allow for rapid interventions and for the design of emergency policies. On the other hand, though mobility data is usually available from official sources in many developed countries for airline transportation, train trips, or commuting, in resource poor countries this information is scarce or does not exist at all, but it can be measured through proxies such mobile phones [7] or social media [31]. The fact that mobility datasets are aggregated at a particular resolution level also constitutes a limitation for many potential studies.
The scientific community recently recognized that one of the challenges in the modelling of social and epidemic processes is the assimilation of geolocalized data, and the construction of hybrid models combining metapopulation and network models with individual traces [32, 33]. Our approach to human mobility is in line with this perspective, as we analyze the effects of incorporating geolocalized traces from social media into the classical gravity model.
Social media platforms like Flickr (www.flickr.com), Twitter (twitter.com), or Foursquare (foursquare.com) offer the possibility of georeferencing the content shared by users. Thus, they constitute a timely source of disaggregated, high-resolution spatio-temporal data on human mobility. The advantage of social media traces with respect to other sources of digital information is that they can be publicly accessed and at a very low cost. This approach has been taken by recent works in the literature, showing that mobility patterns can be successfully extracted from social media traces. Lenormand et al. used traces from Twitter to study highway and roadway transportation networks in Europe [2]; Noulas et al. used a Foursquare dataset to analyze the link between user activity and place transitions [34]; Hawelka et al. modelled international travel of Twitter users by residence country [14]; Lenormand et al. have also used Twitter traces to model commuting from home to work [35]; Grabowicz et al. studied the relation between human mobility and interactions using traces from different social networks [36]; Llorente et al. [37] analyzed the mobility patterns in Spain using Twitter traces; Barchiesi et al. [38] used Flickr data from 16,000 individuals in the UK to model the flows between its 20 largest cities, comparing their results with travel data obtained from surveys.
In this work we used a set of 18 million timestamped, georeferenced pictures from Flickr, taken by 40,000 users in the US, which are part of the Yahoo Flickr Creative Commons 100M public dataset [39]. We processed the sequences of pictures belonging to each individual user in order to extract user trip paths at different resolution levels. Then, we used these emerging collective flow patterns to feed a learning model based on the gravity law.
Our main contribution is to design a data-driven hybrid model of human mobility, in which social media traces are combined with the classical gravity model under a machine learning approach, by training and cross-validating with real datasets. We evaluate the model for two different human activities and resolution scales: an air-travel network and a daily commuting network. Firstly, we show how individual traces can be adapted to these different resolution scales, by tessellating the space into adequate basins and filtering the correct individual flows. Secondly, we combine these traces with the gravity law and we fit the resulting hybrid gravity model using a subset of the real data. Then, we evaluate the fitted model using the remaining part of the dataset. With a cross-validation procedure, we show that the hybrid gravity model can be fitted using a small portion of the data as training set, to correctly predict the remaining mobility flows. In fact, we observe that the incorporation of Flickr traces into the gravity model improves its performance significantly, measured in terms of the determination coefficient. For further comparison, we also evaluate the effect of combining the Flickr traces with the radiation model [6], and we observe a similar beneficial effect. Our findings show that the Flickr traces are representative of the real human mobility and that they can be assimilated into a more theoretical model such as the gravity model. Moreover, this procedure can be applied in other cross-validation contexts in which there is scarce information on mobility, by combining the available data with digital traces from social media.
2 Results
We processed the traces left by 40,000 Flickr users in the US (about 18 million pictures) in order to obtain mobility flow matrices which represent human flows between pairs of geographical nodes, looking for the collective mobility patterns of two types of human activities at different resolution scales: air travel and daily commuting. We also used two real mobility datasets as a ground truth: the RITA dataset of air travel in the US [40], and the commuting data provided by the US Census Bureau [41]. At each resolution scale, flows were aggregated into geographic basins. For air travel, the geographic basins were defined by the presence of an airport, while for the commuting network each basin corresponded to a US county.
The Flickr flow matrices at the airport and county level were extracted from the users’ traces by adding a connection between two of basins i and j whenever a user took a picture in basin i and the subsequent one in basin j. Each connection is then weighted by the total number of users who travelled between i and j. The details about basin construction for the ground truth mobility datasets and flow extraction from users’ traces are contained in Section 4.
The distance between the locations of two consecutive pictures taken by the same user ranges from a few meters to thousands of kilometers, showing a heavy-tailed behavior with an exponential cutoff (see Additional file 1). Thus, it is clear that users’ traces can provide information about very different types of human displacements, ranging from a short walk within a city to long distance trips or international travel. In each case, and when modelling a particular type of mobility, it is important to consider only the relevant traces for that type of movements, otherwise, the effects of different activities will be mixed. We analyzed the effect of distance on both the daily commuting and the air travel, and we set a maximum trip distance of 100 km for daily commuting in the US and a minimum trip distance of 500 km for US air travel. This analysis is included in Section 4.
After aggregating the trips by basin and filtering by distance, we obtain the Flickr flow matrices denoted as \(\mathbf{F}_{{r}} = ( f_{ij}^{r})\) for the air travel network, and \(\mathbf{F}_{{c}} = ( f_{ij}^{c})\) for the commuting network. Here, \(f_{ij}^{c}\) and \(f_{ij}^{r}\) represent the number of Flickr users travelling from basin i to basin j in each of the networks.
2.1 Model definition and fitting
A model of human mobility should be able to predict real mobility flows. The aim of our work was to predict mobility flows of the air travel network and the US commuting network, denoted as \(\mathbf{Y}_{{r}} = ( r_{ij})\) and \(\mathbf{Y}_{{c}} = ( c_{ij})\) respectively. Here, \(r_{ij}\) represents the number of travellers between two airports i and j, while \(c_{ij}\) represents the daily number of commuters between any two counties i and j. Adopting a machine learning approach, we represent this prediction problem as a regression task in which the model is first trained, and then it is used to estimate the real flows, \(\mathbf{Y}_{{r}}\) or \(\mathbf{Y}_{{c}}\), which are the so called target values.
The classical gravity model (described in Section 4) estimates the target values as directly proportional to some powers of the population of the origin and destination basins and inversely proportional to some increasing function of the distance between them (typically, a power-law or exponential function). We will indicate this model as \(\mathbf{G} ( \alpha, \beta, \gamma; \mathbf{P})\), where α, β, and γ are the three exponents in the gravity law, and \(\mathbf{P} = (p_{i})\) is a vector containing the populations of the basins.
Analogously, travel flows estimated from Flickr traces can be described by a model in which we assume that the value \(y_{ij}\) representing the mobility flow between two basins i and j, is proportional to the number of users’ trips from i to j. We represent this model as \(\mathrm{C}_{\mathrm{F}} \mathbf{F}\), where F represents the Flickr flow matrix at the corresponding resolution level (i.e., \(\mathbf{F}_{{r}}\) or \(\mathbf{F}_{{c}}\)), and \(\mathrm{C}_{\mathrm{F}}\) is a constant.
Our approach is to combine the flows of human mobility estimated from the two models under a stacked regression procedure [42] in which each model is fitted alone, and then a linear regression determines the weight of each of them. In this way we combine the Flickr traces and the gravity model to improve the prediction of the real mobility patterns.
2.2 Model evaluation
The performance of a learning model is measured by its capacity to generalize to flows that were not known during the training step. Then, in order to validate our model we used a cross-validation strategy in which we fitted the model using only part of the target flows Y (training set) and then we tested its performance using the remaining flows (test set). We measured the performance in terms of Pearson correlation coefficient ρ and the determination coefficient \(\mathrm{r}^{2}\) between the target flows and the predicted flows.
Cross-validated model performance (10-fold cross-validation)
Model | Commuting network | Air travel network | ||
---|---|---|---|---|
ρ | \(\mathbf{r}^{\boldsymbol{2}}\) | ρ | \(\mathbf{r}^{\boldsymbol{2}}\) | |
Gravity model | 0.69 | 0.41 | 0.68 | 0.40 |
Radiation model (∗) | 0.78 | 0.60 | 0.47 | −0.21 |
Flickr model | 0.69 | 0.47 | 0.78 | 0.62 |
Hybrid gravity model | 0.79 | 0.62 | 0.84 | 0.72 |
Hybrid radiation model | 0.85 | 0.73 | 0.80 | 0.64 |
In the following subsection, we validated the model under data availability constraints and discuss the effects of assimilating the Flickr traces into the radiation model. Additional file 1 also include a geospatial validation and a test based on the Sørensen-Dice coefficient, which scores the model performance for different distances and populations. The latter test shows that the highest performance of the hybrid gravity model is achieved when considering mobility flows towards largely populated basins.
2.3 Prediction under data availability constraints
We evaluated the performance of our regression model as a function of the training set size in order to see the minimum amount of data required to achieve the largest improvement in the predictive power of both the gravity model and the Flickr model. The analysis was performed via a cross-validation scheme with repeated random subsampling (bootstrapping). The advantage of this procedure, with respect to k-fold cross-validation, is that the training set size can be varied in small steps. The results are shown in Figure 5 (panels C, D). For both the air transportation network (panel C) and the commuting network (panel D), we observe a threshold in the relative training set size above which the Flickr model systematically outperforms the gravity model. Such threshold in the relative training set size is less than 1% for the air network and about 10% for the commuting network. Overall, the gravity model always gets close to its best performance with a relatively small training set (<1%) but its determination coefficient does not grow further as we increase the size of the training set. On the other hand, both the Flickr and the hybrid gravity model achieve higher determination coefficients as the training set grows larger and the difference with respect to the gravity model becomes relevant even under small increases of the training set size. For instance, by increasing the relative training set size from 1% to 3% in the air network, the determination coefficient of the hybrid gravity model grows from \(\mathrm{r}^{2} =0.4\) to \(\mathrm{r}^{2} =0.6\), while the performance of the gravity model is unaltered.
2.4 Assimilation of Flickr traces into the radiation model
To further assess the potentialities of hybrid models, we also evaluated the impact of combining the mobility traces from Flickr into the radiation model of mobility at both scales. As shown in Table 1, the radiation model performs well at the commuting level but performs unsatisfactorily in predicting the air travel flows. This is understandable, as flight destinations are better predicted in terms of city importance or population than by the presence of intervening opportunities, such as job availability, which is the main mechanism underlying the radiation model. However, we show that at both distance scales the assimilation of Flickr data improves the prediction of theoretical models. Indeed, the hybrid radiation model is the best performing one for predicting commuting flows, while the hybrid gravity model is the best one for predicting air transportation flows. Additional comparative figures are available in Additional file 1.
3 Discussion
In recent years, the analysis of large mobile phone datasets revealed the high predictability of daily individual whereabouts [45–47], that is explained by the extreme regularity of human behaviour. To what extent such individual predictability is reflected on the predictability of collective human movements on large spatial scales is not completely understood. Several theoretical models of human mobility have been developed to describe collective mobility flows and have been successfully applied in different contexts [15, 17, 48–50], however their predictive power in absence of calibration data or under significant data availability constraints is limited. Most importantly, the true predictive power of mobility models is usually never tested out-of-sample, thus leading to a poor assessment of their applicability beyond specific spatial and temporal scales. In some cases first-principled approaches can be pursued [51], being the radiation model [6] the most relevant example; however, its generalizability across spatial scales, and beyond the original commuting paradigm, has been questioned [22, 52].
The increasing amount of geolocalized data publicly available through the Internet and social media suggests a different approach for predicting human mobility flows, which is the incorporation of large volumes of digital traces into theoretical models. The use of social media traces as a proxy for human movements has been extensively investigated in the literature [34–38], however, a comprehensive and quantitative assessment of their predictive power at different granularities and under cross-validation when compared out-of-sample against official mobility data was still lacking.
In this work, we showed that geolocalized traces collected from a popular photo-sharing platform can successfully inform a predictive model of collective human mobility. Even though individual mobility trajectories can be noisy, their aggregation is representative of collective mobility patterns at different resolution scales. We showed, both qualitatively and quantitatively, that the predictive power of Flickr data is at least as good as that of an informed gravity model. Then, by assimilating these traces with a stacked regression procedure, we obtained a data-driven model which significantly outperformed the gravity model in predicting mobility flows at different spatial scales. Specifically, the hybrid gravity model was tested on the air travel flows connecting 204 airports in the US, and on the commuting movements between 3,144 US counties, thus extending across regions of variable area and population.
The assimilation of Flickr traces addressed some of the limitations of the gravity model: on the one hand, the flows over large distances, which are consistently underestimated by the gravity model due to the fast decay of the gravity law as a function of distance, can be successfully predicted by a hybrid gravity model. On the other hand, the hybrid gravity model overcomes the potential data availability limitations that affect the gravity model. By following a cross-validation with bootstrapping approach, we showed that a very small sample of data is sufficient to calibrate our hybrid gravity model. Even in the absence of calibration data, a model based only on the Flickr traces can outperform the fully calibrated gravity model. This implies that a hybrid gravity model can be successfully applied even in places where no official mobility data is available.
Although the present study relies on the open Flickr 100M dataset [39], which is available to the public, other social media sources and online social networks can provide relevant signals for modelling human mobility. Our work suggests the possibility of assimilating data from several sources simultaneously, to capture other types of behaviours and improving the mobility predictions. Future research also involves exploring other assimilation techniques different from the simple stacked regression.
In conclusion, our work has exposed the real potential of digital traces to predict collective mobility flows. The assimilation of individual traces into data-driven models can predict spatial mobility patterns at different resolution scales with a high cross-validated performance and under low data requirements. More broadly, our contribution represents an initial guideline on how to effectively incorporate social media traces into predictive models of human mobility that can be further used to accurately describe epidemic outbreaks, manage disaster response, or plan urban services at appropriate spatial scales.
4 Methods
4.1 Data sources
Data from air travel in the US was obtained from the Airline Origin and Destination Survey [40] collected by the US Bureau of Transport Statistics, while commuting data belongs to the US Census Bureau [41]. Both datasets are publicly available at their respective web sites. The Flickr data was obtained from the Yahoo Flickr Creative Commons 100M public dataset [39] and is publicly available at Yahoo Webscope [53].
4.2 Ground-truth flow matrices
The ground-truth flow network for the air transportation in the US was built using the RITA dataset [40], containing a 10% sample of all the domestic itineraries in the US We used the subset correspondent to 2014, which comprises around 14 million air tickets between 466 airports. We define an airport basin as the area covered by an airport (i.e., the set of points for which that airport is the closest one). Thus, the partition of the US territory into basins is built as the Voronoi tessellation given by the airport coordinates. We note that when two airports are at less than 30 km of distance, we consider that they represent a single airport basin (because they serve the same metropolitan area) and we replace them by a single airport before computing the Voronoi tessellation. We also generalized this idea to connected components of airports at less than 30 km.
Each ticket in the RITA dataset contains an itinerary formed by several coupons. Each coupon contains information about the origin and destination airport of the trip and the number of passengers, and it also points out whether the destination was a stopover or either the passengers remained there. Those destinations in which the passengers did other than just a stopover are marked as trip breaks (TB); the first and last airports from an air ticket are always TB’s. By removing the stopovers from the itinerary we manage to clean the flow network from the presence of airport hubs, which do not represent the real passengers destination. For each itinerary, we obtain a list of destinations (\(d_{1}, d_{2},\ldots, d_{n}\)) (\(d_{1}\) is the departing city), and we use it to build the flow network between the airport basins.
The commuting dataset [41] contains data estimated from census for around 140 million commuters during the period 2009-2013 at the county level. It specifies the estimated number of people commuting between a home county and a working county. We consider the list of US counties as nodes for the commuting flow network, and for each pair of counties (\(c_{i}\), \(c_{j}\)) we put a weighted link counting the estimated number of workers that live in one of them and work in the other.
We use the notation \(\mathbf{Y}_{{r}} = ( r_{ij})\) and \(\mathbf{Y}_{{c}} = ( c_{ij})\) for the adjacency matrices which describe the airport and commuting ground-truth flows, respectively. We put the diagonal elements of both matrices to zero, because we shall not consider users who fly within the same airport basin or commuters who work in the same county of residence. The air travel matrix \(\mathbf{Y}_{{r}}\) contains 204 airport basins with 30,472 links, involving 40 million flows. The commuting matrix \(\mathbf{Y}_{{c}}\) covers the 3,144 US counties and has 55,578 links, describing the mobility patterns of about 37 million commuters.
4.3 Flickr-based flow matrices
The traces left by Flickr users when they take and upload pictures are given by the coordinates of their geotagged pictures, ordered by the time in which they were taken. We only consider timestamped, geolocalized pictures taken in the US. For each user we obtain an array of pictures (\(p_{1}, p_{2},\ldots, p_{n}\)) sorted by timestamp. At a particular resolution level (airport or county) we will consider that the user makes a trip when two consecutive pictures have coordinates belonging to different basins. Then, we aggregate all the users’ trips into a Flickr flow matrix.
For building county-level Flickr flow matrix, pictures are assigned to counties by considering the county borders as defined in the MAF/TIGER geographic database of the US Census Bureau. A flow between counties (i, j) is counted whenever two consecutive pictures (\(p_{i}\), \(p_{ i+1}\)) are taken in counties i and j respectively. We do not consider successive pictures in which the user does not change county.
In the airport-level Flickr flow matrix, pictures are assigned their nearest airport basin. A flow between two airport basins (i, j) is registered whenever two successive pictures are taken in basins i and j respectively. We do not consider successive pictures in which the user does not change airport basin.
The adjacency matrices of these networks are denoted as \(\mathbf{F}_{{c}} = ( f_{ij}^{c})\) and \(\mathbf{F}_{{r}} = ( f_{ij}^{r})\). Both of them have zero diagonals. In total, we observed around 350,000 trips between airport basins and 520,000 trips between counties. The flow networks contain around 26,000 and 150,000 nonzero elements, respectively.
4.4 Distance thresholds
The activity of Flickr users involves different modalities of mobility and has to be correctly filtered before comparing it against the ground-truth flows. In Additional file 1 we analyze the effect of a distance threshold in the model performance, and we conclude that the Flickr flows have good agreement with the ground-truth for distances above 500 km for air travel, and below 100 km for commuting.
4.5 Gravity model
4.6 Radiation model
Declarations
Acknowledgements
This work has been partially funded by the EC FET-Proactive Project MULTIPLEX (Grant No. 317532) to MT and CC. The authors also acknowledge support from the ‘Lagrange Project’ of the ISI Foundation funded by the Fondazione CRT and from the ‘S3 Project’ funded by the Compagnia di San Paolo.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Roth C, Kang SM, Batty M, Barthèlemy M (2011) Structure of urban movements: polycentric activity and entangled hierarchical flows. PLoS ONE 6(1):e15923 View ArticleGoogle Scholar
- Lenormand M, Tugores A, Colet P, Ramasco JJ (2014) Tweets on the road. PLoS ONE 9(8):e105407 View ArticleGoogle Scholar
- Song L, Kotz D, Jain R, He X (2006) Evaluating next-cell predictors with extensive wi-fi mobility data. IEEE Trans Mob Comput 5:1633-1649 View ArticleGoogle Scholar
- Abou-zeid H, Hassanein HS, Valentin S (2013) Optimal predictive resource allocation: exploiting mobility patterns and radio maps. In: IEEE global communications conference 2013 (GLOBECOM 2013), pp 4877-4882 View ArticleGoogle Scholar
- Ravenstein EG (1885) The laws of migration. J Stat Soc Lond 52(2):167-235 View ArticleGoogle Scholar
- Simini F, González MC, Maritan A, Barabási A-L (2012) A universal model for mobility and migration patterns. Nature 484(7392):96-100 View ArticleGoogle Scholar
- Wesolowski A, Eagle B, Tatem AJ, Smith DL, Noor AM, Snow RW, Buckee CO (2012) Quantifying the impact of human mobility on malaria. Science 338(6104):267-270 View ArticleGoogle Scholar
- Balcan D, Colizza V, Gonçalves B, Hu H, Ramasco JJ, Vespignani A (2009) Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci USA 106(51):21484-21489 View ArticleGoogle Scholar
- Merler S, Ajelli M (2010) The role of population heterogeneity and human mobility in the spread of pandemic influenza. Proc R Soc Lond B 277(1681):557-565. View ArticleGoogle Scholar
- Marguta R, Parisi A (2015) Impact of human mobility on the periodicities and mechanisms underlying measles dynamics. J R Soc Interface 12(104):20141317 View ArticleGoogle Scholar
- Gonzalez MC, Hidalgo CA, Barabási A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779-782 View ArticleGoogle Scholar
- Rhee I, Shin M, Hong S, Lee K, Kim SJ, Chong S (2011) On the Levy-walk nature of human mobility. IEEE/ACM Trans Netw 19(3):630-643 View ArticleGoogle Scholar
- Noulas A, Scellato S, Lambiotte R, Pontil M, Mascolo C (2012) A tale of many cities: universal patterns in human urban mobility. PLoS ONE 7(5):e37027 View ArticleGoogle Scholar
- Hawelka B, Sitko I, Beinat E, Sobolevsky S, Kazakopoulos P, Ratti C (2014) Geo-located Twitter as proxy for global mobility patterns. Cartogr Geogr Inf Sci 41(3):260-271 View ArticleGoogle Scholar
- Zipf GK (1946) The \(\mathrm{P}_{1}\) \(\mathrm{P}_{2}/\mathrm{D}\) hypothesis: on the intercity movement of persons. Am Sociol Rev 11(6):677-686 View ArticleGoogle Scholar
- Alonso W (1976) A theory of movements: I, introduction. Working paper No 266, Institute of Urban and Regional Development, University of California Berkeley Google Scholar
- Stouffer SA (1940) Intervening opportunities: a theory relating mobility and distance. Am Sociol Rev 5(6):845-867 View ArticleGoogle Scholar
- Jung W, Wang F, Stanley HE (2008) Gravity model in the Korean highway. Europhys Lett 81(4):48005 View ArticleGoogle Scholar
- Grosche T, Rothlauf F, Heinzl A (2007) Gravity models for airline passenger volume estimation. J Air Transp Manag 13(4):175-183 View ArticleGoogle Scholar
- Liu Y, Sui Z, Kang C, Gao Y (2014) Uncovering patterns of inter-urban trip and spatial interaction from social media check-in data. PLoS ONE 9(1):e86026 View ArticleGoogle Scholar
- Krings G, Calabrese F, Ratti C, Blondel VD (2009) Urban gravity: a model for inter-city telecommunication flows. J Stat Mech Theory Exp 2009(7):L07003 View ArticleGoogle Scholar
- Masucci AP, Serras J, Johansson A, Batty M (2013) Gravity versus radiation models: on the importance of scale and heterogeneity in commuting flows. Phys Rev E 88:022812 View ArticleGoogle Scholar
- Liang X, Zhao J, Dong L, Xu K (2013) Unraveling the origin of exponential law in intra-urban human mobility. Sci Rep 3:2983 Google Scholar
- Yang Y, Herrera C, Eagle N, González MC (2014) Limits of predictability in commuting flows in the absence of data for calibration. Sci Rep 4:5662 Google Scholar
- Truscott J, Ferguson NM (2012) Evaluating the adequacy of gravity models as a description of human mobility for epidemic modelling. PLoS Comput Biol 8(10):e1002699 MathSciNetView ArticleGoogle Scholar
- Calabrese F, Di Lorenzo G, Liu L, Ratti C (2011) Estimating origin-destination flows using mobile phone location data. IEEE Pervasive Comput 10(4):36-44 View ArticleGoogle Scholar
- Palchykov V, Mitrović M, Jo H, Saramaki J, Pan RK (2014) Inferring human mobility using communication patterns. Sci Rep 4:6174 View ArticleGoogle Scholar
- Alexander L, Jiang S, Murga M, González MC (2015) Origin-destination trips by purpose and time of day inferred from mobile phone data. Transp Res, Part C, Emerg Technol 58(B): 240-250 View ArticleGoogle Scholar
- Tizzoni M, Bajardi P, Decuyper A, Kon Kam King G, Schneider CM, Blondel VD, Smoreda Z, González MC, Colizza V (2014) On the use of human mobility proxies for modeling epidemics. PLoS Comput Biol 10(7):e1003716 View ArticleGoogle Scholar
- Bengtsson L, Gaudart J, Lu X, Moore S, Wetter E, Sallah K, Rebaudet S, Piarroux R (2015) Using mobile phone data to predict the spatial spread of cholera. Sci Rep 5:8923 View ArticleGoogle Scholar
- Dredze M, García-Herranz M, Rutherford A, Mann G (2016) Twitter as a source of global mobility patterns for social good. arXiv:1606.06343
- Riley S, Eames K, Isham V, Mollison D, Trapman P (2015) Five challenges for spatial epidemic models. Epidemics 10:68-71 View ArticleGoogle Scholar
- Gonçalves B, Perra N (2015) Social phenomena: from data analysis to models. Springer, Berlin View ArticleGoogle Scholar
- Noulas A, Scellato S, Mascolo C, Pontil M (2011) An empirical study of geographic user activity patterns in Foursquare. In: Proc of the 5th int AAAI conference on weblogs and social media, pp 570-573 Google Scholar
- Lenormand M, Picornell M, Cantú-Ros OG, Tugores A, Louail T, Herranz R, Barthèlemy M, Frías-Martínez E, Ramasco JJ (2014) Cross-checking different sources of mobility information. PLoS ONE 9(8):e105184 View ArticleGoogle Scholar
- Grabowicz PA, Ramasco JJ, Gonçalves B, Eguíluz VM (2014) Entangling mobility and interactions in social media. PLoS ONE 9(3):e92196 View ArticleGoogle Scholar
- Llorente A, Garcia-Herranz M, Cebrian M, Moro E (2015) Social media fingerprints of unemployment. PLoS ONE 10(5):e0128692 View ArticleGoogle Scholar
- Barchiesi D, Preis T, Bishop S, Moat HS (2015) Modelling human mobility patterns using photographic data shared online. R Soc Open Sci 2(8):150046 MathSciNetView ArticleGoogle Scholar
- Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li L (2016) YFCC100M: the new data in multimedia research. Commun ACM 59(2):64-73 View ArticleGoogle Scholar
- Bureau of Transportation Statistics. Commuting (journey to work). http://www.rita.dot.gov/bts/data_and_statistics/index.html. Accessed 11 Apr 2016
- US Census Bureau. Airline origin and destination survey. http://www.census.gov/hhes/commuting/. Accessed 11 Apr 2016
- Breiman L (1996) Stacked regressions. Mach Learn 24(1):49-64 MathSciNetMATHGoogle Scholar
- Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning View ArticleMATHGoogle Scholar
- Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90-95 View ArticleGoogle Scholar
- Song C, Qu Z, Blumm N, Barabási A-L (2010) Limits of predictability in human mobility. Science 327(5968):1018-1021 MathSciNetView ArticleMATHGoogle Scholar
- Qin S-M, Verkasalo H, Mohtaschemi M, Hartonen T, Alava M (2012) Patterns, entropy, and predictability of human mobility and life. PLoS ONE 7(12):e51353 View ArticleGoogle Scholar
- Lu X, Wetter E, Bharti N, Tatem AJ, Bengtsson L (2013) Approaching the limit of predictability in human mobility. Sci Rep 3:2923 Google Scholar
- Domencich TA, McFadden D (1975) Urban travel demand: a behavioral analysis. North-Holland, Amsterdam Google Scholar
- Simini F, Maritan A, Néda Z (2013) Human mobility in a continuum approach. PLoS ONE 8(3):e60069 View ArticleGoogle Scholar
- Lenormand M, Huet S, Gargiulo F, Deffuant G (2012) A universal model of commuting networks. PLoS ONE 7(10):e45985 View ArticleGoogle Scholar
- Ren Y, Ercsey-Ravasz M, Wang P, González MC, Toroczkai Z (2014) Predicting commuter flows in spatial networks using a radiation model based on temporal ranges. Nat Commun 5:5347 View ArticleGoogle Scholar
- Yan X-Y, Zhao C, Fan Y, Di Z, Wang W-X (2014) Universal predictability of mobility patterns in cities. J R Soc Interface 11(100):20140834 View ArticleGoogle Scholar
- Yahoo Labs. Yahoo webscope. http://webscope.sandbox.yahoo.com/. Accessed 11 Apr 2016
- GeoNames. GeoNames. http://geonames.org/. Accessed 11 Apr 2016