Characteristics of human mobility patterns revealed by high-frequency cell-phone position data

Human mobility is an important characteristic of human behavior, but since tracking personalized position to high temporal and spatial resolution is difficult, most studies on human mobility patterns rely largely on mathematical models. Seminal models which assume frequently visited locations tend to be re-visited, reproduce a wide range of statistical features including collective mobility fluxes and numerous scaling laws. However, these models cannot be verified at a time-scale relevant to our daily travel patterns as most available data do not provide the necessary temporal resolution. In this work, we re-examined human mobility mechanisms via comprehensive cell-phone position data recorded at a high frequency up to every second. We found that the next location visited by users is not their most frequently visited ones in many cases. Instead, individuals exhibit origin-dependent, path-preferential patterns in their short time-scale mobility. These behaviors are prominent when the temporal resolution of the data is high, and are thus overlooked in most previous studies. Incorporating measured quantities from our high frequency data into conventional human mobility models shows contradictory statistical results. We finally revealed that the individual preferential transition mechanism characterized by the first-order Markov process can quantitatively reproduce the observed travel patterns at both individual and population levels at all relevant time-scales.


Introduction
Due to the increasing availability of mobile-phone records, global-positioning-system data and other datasets capturing traces of human movements, numerous statistical patterns in human mobility have been revealed, ranging from the confined radius of gyration at the individual level [1] to the commuting fluxes at the collective level [2]. These empirical observations suggest that human mobility are barely random, but follow predictable rules [3][4][5][6][7][8][9][10][11][12]. Accordingly, models have been proposed to understand the observed mobility patterns. Following the pioneer model which generates empirical scaling behaviors by introducing two generic mechanisms, exploration and preferential return (EPR) [2], a large number of models for individual human mobility have been developed. Examples include the variants of the EPR model which describe user virtual mobility in cyberspace [13][14][15] by incorporating a gravity model to simulate the returner-explorer dichotomy [16], introducing a social circle to model the conserved number of locations an individual visits [17], aggregating individual trajectories to generate collective movements [18], and so on.
On the other hand, it has been shown that there is a diversity of human mobility patterns at different spatial scales. On the largest spatial scale which constitutes international movements, they are largely constrained by the entry requirement of individual countries, leading to asymmetric international movements [19,20]. On the spatial scale within a country, models which describe international movements do not well explain inter-city movements. For instance, the inter-city human mobility is claimed to be mainly driven by the search for better job opportunities [6,21]. The radiation model assumes that individuals tend to select the nearest locations with large benefits. On the spatial scale within a city, the local movements are better predicted by a population-weighted opportunity model where the potential area of coverage of individuals includes the whole city as a manifestation of the high mobility at the city scale [22]. Although large efforts have been devoted to understand human mobility at different spatial scales, the studies of human mobility at different temporal scales are limited, due to the lack of high frequency mobility data [23,24]. Understanding spatial-temporal human mobility patterns at different scales would lead to numerous applications, such as suppressing epidemic spreading [25,26], mitigating traffic congestion [27,28], urban planning [29,30] and so on.
To reveal the human mobility pattern at different temporal scales, high frequency position data are required. While most existing empirical studies on human mobility are based on cell phone position data, these data are CDRs (Call Detail Records) where user positions are only recorded when they initiate or receive a call or a text message [31]. These datasets can include position records of up to several million anonymous mobile phone users, but the data has in general a low temporal resolution, as user positions are not recorded most of the time. There is a recent work pointing out that position sampling frequency may significantly alter some statistics of human mobility [32]. The missing position data in some literature are interpolated via specific optimization algorithms or are incorporated from other data sources [33,34]. Difference may exist between the interpolated and the real data. Another usual practice to improve the temporal resolution of the data is to filter out users with long idle periods. For instance, this approach has been applied to extract a sample of user data with sufficient mobility records for inferring the nature of their visited locations such as home and workplace, and their tour trajectories with start and end point at home are investigated accordingly [28]. However, many problems still remain. On one hand, the user filtering procedure may lead to the risk of biased sampling of the original data. Specifically, the filtered data only include users who make frequent phone calls and may be biased to users with specific professions. On the other hand, the temporal resolution of the data after filtering is still insufficient (as frequent as every 10 min in existing literature), leaving many detailed user mobility traces missing from the data. Another possible data source is global-positioning-system (GPS) data [35,36]. Their temporal resolution can be very high, but as GPS data are mostly recorded by navigation devices in vehicles, it only records positions when users are driving. As a result, GPS data are commonly used for analyzing traffic [37].
In this paper, we utilize the cell phone 4G communication data in Shijiazhuang, a city in northern China, to identify the location of individual cell phone user to a high frequency of every second. With this high-frequency position data, we study human mobility patterns at different time-scales. We find that human show a low tendency to re-visit locations that one has frequently visited. Instead, individuals exhibit origin-dependent, path-preferential patterns in their short time-scale mobility. Finally, we consider a simple model character-ized by the first-order Markov process to quantitatively reproduce the observed travel patterns at both the individual and population levels in the high temporal resolution data. Our work reveals the heterogeneity in human mobility mechanism at different temporal scales, opening up a new dimension for understanding human mobility behaviours.

Data
Our study is based on a full set of 4G communication data for 14 days between cell phones and cell towers in Shijiazhuang, the capital and largest city of North China's Hebei Province. The city has population over 10 million, and its total area is 15,848 square kilometers (Urban area is 2206 square kilometers). There are about 12,000 4G cell towers in Shijiazhuang, with 7000 towers in urban area and 5000 towers in suburb area. The position of a user is recorded when his/her cell phone connects to the closest cell towers for the 4G communication service [38]. As most applications in cell phones constantly exchange data with the back-end servers, the position of a user can be recorded up to every second. Compared with the traditional cell phone data (CDRs) where the position of users is only recorded when they make phone calls, our obtained dataset is much higher in temporal resolution for analyzing individual mobility behavior.
Due to the popularity of smart phones, our dataset actually covers a large proportion of population in the city. For privacy reasons, the data is anonymous and each user is assigned with a unique ID. The original data include records of 5,336,194 users. In order to obtain a dataset describing the mobility patterns of active users with high temporal resolution, we have implemented strict rules to exclude users who do not move at all and those whose data is largely incomplete (i.e. those who have one or more days with less than 20-hour daily record in the consecutive 14-day period). Finally, we single out and analyze the mobility data of 55,389 users who satisfy the above criteria. The basic descriptive statistics of this data is shown in Fig. S1 and S2 of the supplementary information (SI, see Additional file 1).

Results
Empirical human mobility pattern at short time-scales. We start our analysis by constructing the mobility network of a typical mobile phone user in Fig. 1a. Each node is a location defined by an area of the geographical location of the cell tower. The network only consists of the nodes visited and stayed more than 3 minutes by the user, with node size proportional to the frequency he/she visited the location. Two nodes are connected by a link if the user has traveled at least once between the two locations. To understand the mobility patterns in the high temporal resolution data, we shuffle the trajectory of typical users by randomly reordering the sequence of their visited locations. The frequency users visited specific locations is therefore preserved. The mobility pattern constructed from the shuffled trajectory of the typical user in Fig. 1a is illustrated in Fig. 1b. An obvious difference is observed when we compare Fig. 1a and 1b, suggesting that preserving the visitation frequency of locations fails to reproduce mobility networks obtained with the high frequency dataset. Similar results of the real and the shuffled trajectories of three other randomly selected users are shown in Fig. S3 of the SI.
In order to quantify the statistical difference between the mobility patterns in real and shuffled trajectories, we consider four metrics to quantify the trajectories of individuals.
The first one is the total number of unique transited location pairs (transited pairs for short), denoted as n pair α for user α, which is equivalent to the number of links in the mobility Here each node is a location visited by the user, with node size proportional to its visitation frequency. A link is drawn when the user has traveled at least once between the two locations. The shuffled data is obtained by randomly reordering the sequence of visited locations. In this way, the visitation frequency of each location by the user is preserved while the travel trajectory is randomized. network of user α. We then compare n pair α for all users in the real data and the shuffled data in Fig. 1c. A box in the standard boxplots are marked in green if the line y = x lies between 10% and 91% in each bin and in red otherwise. One can see that n pair α in the shuffled data is significantly larger than that in the real data. It is because for each individual there exists a few locations with large visitation frequency (e.g. home or office), in the shuffled data users are attracted back to these locations regardless of the distance from the current location, before visiting other locations. In the real data, however, users do not always return to the frequently visited locations if they are too far away, resulting in a much smaller n pair α , i.e. a much fewer transited pairs than that in the shuffled data.
The second metric we examined is the spread, as measured by the variance Var α , among the usage frequency of transited pairs of user α (i.e. link weights in the mobility network). As shown in Fig. 1d, a large Var α indicates that an individual α repeatedly uses a small number of routes and occasionally traveled through other routes. One can see that the values of Var α are larger in the real data than in the shuffled data, implying that users in the real data more frequently travel between a smaller number of location pairs.
The third metric we examined is the covered distance d is observed in the shuffled data, as users in the shuffled data always return to the frequently visited locations even if they are far away.
Finally, the fourth metric, the total traveled distance d total α , is larger in the shuffled data, as shown in Fig. 1f. As this metric is very sensitive to discrepancies in the predicted trajectory, it is largely ignored in the existing literature. The larger d total α in the shuffled data is also due to the fact that users often return to the far away yet frequently visited locations in the shuffled data. In fact, d total α is an important metric, capturing the geographic features of human mobility. All the above results suggest that although the shuffled trajectories of individuals preserved the location visitation frequency, the patterns from the shuffled data are significantly different from those in the real data.
We further investigate the effect of shuffling on the human mobility patterns at the collective level. From each location, we compute the number of different locations that users travel to. This quantity is essentially the number of links that a location i has in the mobility network, denoted as k i . The corresponding distribution is shown in Fig. 1g. We see that both distributions P(k i ) of the real and the shuffled data resemble distributions with a power-law tail, yet their exponents are clearly different, with the tail obtained from the real data to be much shorter. The exponents are respectively -1.27 for the real data and -1.05 for the shuffled data, obtained by power-law fitting to the tail of the distributions starting from k i = 50. We see similar difference when we compare the population flux F ij between each pair of locations ij in the real data and the shuffled data in Fig. 1h. Both the flux F ij in the real data and the shuffled data follow power-law distributions. However, the exponent for the fitted power-law function is larger in the real data, indicating that the distribution P(F ij ) of the real data has a longer tail and a larger maximum value of F ij . The exponents are respectively -1.81 for real data and -1.91 for shuffled data, obtained by power-law fitting to the whole distributions of P(F ij ). For both P(k i ) and F ij , we have also fitted the probability distributions after log-binning, and obtained the similar exponents as presented above (See Fig. S4 in SI).
Other than revealing human mobility patterns in the spatial dimension, our high frequency data also allow us to reveal the temporal dimension of human mobility activities. To this end, we denote the duration of each of a user's stay at a location as t stay , and examine the distribution P(t stay ) over all users. As we can see in Fig. 2a, P(t stay ) shows a power-law head and an exponential tail. The power-law function with exponent -1.41 has been used to fit the head of the distribution until t stay = 6 (hours). The power-law head suggests that the duration of a stay at different locations is heterogeneous, and there are a large number of locations with relatively short duration of each stay. Note that these values of duration are sufficiently large, e.g. larger than 3 minutes (typical time for users to walk out of the several hundred meters radiation range of a cell tower), and are not pass-by locations. On the other hand, the small peak at the tail is mostly contributed by the duration when users stay or sleep at home.
As evident from Fig. 2a, many locations visited by users for a short time may have been neglected if the dataset do not have a high temporal resolution. Since our 4G cell phone data record user positions in every second, this allows us to examine data with different temporal resolution by data pruning. In order to examine how the mobility statistics are affected by the temporal resolution of the datasets, we consider a threshold and remove all the visited locations with t stay < T, for all users. In Fig. 2b, we show the average number of visited locations as a function of T. One can see that the number of visited locations decreases with an increasing T in a power-law form with an exponent -0.73, implying that the lower the temporal resolution of the data, the more substantial fraction of the visited locations are overlooked in the analyses. Indeed, many hidden mobility patterns at the short time-scale may have been neglected in existing studies which are based on mobility datasets with a low temporal resolution.
To further examine how the temporal resolution of the dataset affects the mobility statistics, we show in Fig. 2c-2f the difference between the real and the shuffled data in terms of covered by the maximum loop, the total traveled distance d total α , under various data removal thresholds T. The difference is measured by the fraction of users whose metric values in the shuffled data are larger than those in the real data, except for Var α . As data shuffling tends to decrease the spread of the traveled frequency of transited pairs, the difference in Var α is computed as the fraction of user α with Var α in the shuffled data smaller than that in the real data. Remarkably, when temporal resolution is low (i.e. T is large), our results only show a small difference between the real and the shuffled data in terms of these four metrics at the individual level. As a lower temporal resolution which corresponds to a smaller number of links in the network, it is important to check whether the observed small difference is an artifact of the shuffling process which cannot alter the network structure in these small networks. We show in Fig. S5 in SI that when T is large (e.g. even for T = 150), there are multiple nodes and it is still possible for the shuffling process to change the structure of the network. In addition, we generate a random network with the same number of nodes and same degree sequence as the real network for each threshold T. We find that the difference of the network before and after shuffling is almost constant with respect to T, given the initial network is randomly generated. In comparison, the differences between the real and shuffled network are much higher and are decreasing with respect to T, indicating that the difference between the original and shuffled real networks is not an artifact of increasing T (see Fig. S5 and note 3 in SI).
Similar results can be observed when we compare the power-law distributions in Figs. 1g and 1h under different temporal resolutions. Figures 2g and 2h show that the difference between the exponents of the distributions in Figs. 1g and 1h obtained from the real and the shuffled data is large when the threshold T is small, then become negligible when T is large. Another important observation in Fig. 2h is that the exponent magnitude of the flux distribution increases with T, indicating that the maximum flux between locations is higher in cases with large threshold. In other words, using datasets with a low temporal resolution would underestimate the flux between locations. Additionally, we study motifs in human travel trajectories [39] in Figs. S6 and S7 (see discussion in SI note 4). A detailed comparison of the human travel motifs in the real data and shuffled data shows that the shuffling process does not significantly alter the motif distribution when T is large, yet the difference between the motif distribution in the real data and the shuffled data is substantial when T is small.
Origin-dependent preference on the next visiting location. In order to understand the reasons underlying the observed difference between the real data and the shuffled cases, we compare their matrices recording the travel frequency of a typical user between each location pair. The matrices are computed with the temporal resolution T = 3 min, and are shown as heatmaps in Figs. 3a and 3b respectively for the real data and the shuffled data. Some large values can be seen in the heatmap of the real data, which suggests that users tend to repeatedly transit between a small number of location pairs. However, this preference of transitions, or equivalently the preference of transited location pairs, cannot be captured in the shuffled data.
We further examine the probability for the selected typical user to visit different locations starting from different origins in Fig. 3c. Different locations are indexed in the horizontal axis, with each blue curve corresponds to the probability to visit other locations from a specific origin; the black dashed curve corresponds to the overall visitation probability distribution. Compared to the black dashed curve, different blue curves peak at dif- Figure 3 Origin-dependent mobility behavior. Heat maps which show the matrices of the travel frequency of a typical user from one location to another in (a) the real data, and (b) the shuffled data of the selected typical user. The location visitation probability in (c) the real data and (d) the shuffled data by the selected typical user originated from specific locations. As an example, the red curves in (c) and (d) shows the visitation probability distribution of the selected user originated from location 1, while the black curves show the visitation probability distribution aggregated from all starting locations. In (e) and (f), we show the probability of locations from which the most frequent locations to be next visited is the same as the overall most frequently visited locations (i.e. p j * i =j * ). The probability p j * i =j * is calculated for each user in both the real data and the shuffled data. (e) shows the scatter plot of p j * i =j * , indicating that in the real data the most likely locations to be next visited from many locations are different from the overall most frequently locations to be visited. (f) shows the distribution of p j * i =j * in real data and the shuffled data ferent locations, suggesting that the next location that a user visits is not always the most frequently visited ones, but instead strongly depends on his present location. Similarly, we show the visitation probability distribution for each starting location in the shuffled data in Fig. 3d, of which the peaks of the blue curves are consistent with those of the black dashed lines. The comparison between Figs. 3c and 3d shows that in the real data, users' preference on the locations to be visited are dependent on their current location. A more quantitative analysis can be made by computing the probability that the most frequently visited location j * i from location i is consistent with the overall most frequently visited location j * , i.e. p j * i =j * . Figure 3e shows the scatter plot and the bin average of p j * i =j * for each user in the real and the shuffled data. Figure 3f shows the distribution of p j * i =j * for all users in the real and the shuffled data. Both figures show that p j * i =j * is smaller in the real data than that in the shuffled data, again suggesting the origin-dependent preference on the locations to be visited.
Data-integrated Models. With the comprehensive cell-phone position dataset and based on our previous findings, we go on to examine the essential mechanisms underlying human mobility patterns. To achieve the goal, we plug various empirical quantities such as the popularity of locations and the frequency of transition between locations into existing human mobility models, and compare the emergent behavior from the models with empirical results.
We first start with the simplest preferential return model of which the probability for an individual to visit a location is proportional to the frequency the location was visited in the past [2]. We can thus write down the transition probability p α:i→j (t) of an individual α to travel from a location i to a location j at time t to be where f α:j (t) is the empirical frequency that a location j is visited by an individual α before time t. We call the above the individual preferential return (IPR) mechanism. A simulated trajectory with (1) to be the transition probability is shown in Fig. 4b, again compared with the real empirical trajectory shown in Fig. 4a. As we can see, many transitions absent in the empirical data are found in the simulated results. Furthermore, we consider a metric d total α to examine statistically the validity of this model. We use d total α because it is a geographicaware metric which captures even small inaccurate predictions of paths in the users' travel trajectory. As shown in the scatter plots of d total α in Fig. 4g, other than a specific individual, many of the simulated trajectories are longer than their counterparts in the empirical data, which may be a result of the transitions between more distant locations in simulations as in Fig. 4b. These results imply that the IPR mechanism is insufficient to explain human mobility patterns. Since the data of IPR are independent of origin, one may expect that origin-dependent transitions are indeed crucial in explaining mobility patterns.
While the preferential return model is over-simplified in explaining human movement, we then explore the significance of origin-dependent transitions in explaining mobility patterns. Since the individual frequency of transition between two locations is difficult to be modeled, many existing studies only utilize the average transition frequency over the population. Related models for predicting the average transition frequency over the population include the gravity model [40], radiation model [6], population-weighted opportunity model [22] and so on. We call this the population preferential transition (PPT) Here each node is a location visited by the user, with node size proportional to its visitation frequency generated by the corresponding model. (f) Comparison of the distributions of flux between location pairs, P(F ij ), in the real and the shuffled data as well as the simulated trajectories from different models. The results of the fitted exponents suggest that the IPT model can best reproduce the flux distribution in real data. The scatter plots of the total traveled distance d total α of each user α between the real data and the simulated data by (g) the IPR model, (h) the PPT model, (i) the PIPR model, and (j) the IPT model mechanism, of which the transition probability p α:i→j (t) is given by where f i→j (t) is the empirical frequency of which the population travel from location i to j before time t. As shown in Fig. 4c, the trajectory of this specific individual is dominated by paths which connect between near locations, reflecting the average behavior of the population to go to near and attractive locations [6,22,40]. This trajectory in Fig. 4c is significantly different from the real trajectory in Fig. 4a. Consistently, we see in Fig. 4h that the simulation underestimates the real total travel distance d total α for most individuals in the empirical data. These results imply that individuals travel to fulfill specific purposes by which short distance is not the main consideration. Although not surprising, the results suggest that the PPT mechanism is insufficient to explain the individual mobility patterns.
In a recent work [18], a model combining the memory effect and the population-induced competition is proposed to simulate human mobility between locations based only on their population. Basically, individual mobility in this model is driven by both preferential return and collective mobility between locations. In order to test whether this model can generate realistic human mobility at high temporal resolution data, we consider a population-weighted individual preferential return model (PIPR) combining IPR and PPT, with the transition probability given by This model is actually a simplified version of the model proposed in ref. [18], where the collective mobility between locations as predicted by popularity distribution is replaced by the population preferential transition probability. As shown in Figs. 4d and 4i, although the trajectory and the total travel distance are more similar to the empirical data than merely IPR or PPT, they are still different from the real data as it substantially underestimates d total α in the high temporal resolution human mobility data. Inspired by the empirical observation in Fig. 3 that people tend to repeatedly transit between a small number of location pairs, we consider here another model based on the first-order Markov process that might explain the driving mechanism in the high temporal resolution human mobility. We call the mechanism the individual preferential transition (IPT). In this case, the transition probability p α:i→j (t) is given by where f α:i→j (t) is the empirical frequency of which individual α travels from location i to j before time t. As we can see in Fig. 4e, the simulated trajectory resembles the real trajectory shown in Fig. 4a. Other than this specific individual, we see in Fig. 4j that the simulated d total α of each individual shows a more linear relation with their counterparts in the real data, compared to the above three models (see Figs. 4g, 4h and 4i respectively). These results imply that the IPT mechanism outperforms other factors of preferential return or population competition in capturing human mobility trajectories in high temporal resolution.
When simulating the four models (i.e., IPR, PPT, PIPR, IPT, see Table 1), we draw the initial configurations of these models from the real data. Specifically, f α:j (t) in IPR, f i→j (t) in PPT, f α:i→j (t) in IPT are set to be the values extracted from the empirical data. The vectors of f α:j (t) for each user α in IPR and the matrices of f α:i→j (t) for each user α in IPT are then updated during the simulation. In the IPT model, f α:i→j (t) increases by 1 if individual α travels from location i to j during the simulation. Similarly, in the IPR model, f α:j (t) increases by 1 if individual α visits location j during the simulation. We stop the simulation for an individual α after he/she finishes the same number of travels as in his/her real data for 14 days.
A remarkable advantage of the state-of-the-art human mobility models is that they can reproduce collective human mobility by aggregating simulated individual mobility trajectories [18]. One important metric that is usually used to examine this feature is the distribution P(F ij ) of the flux between locations. Figure 4f presents respectively the fitted curves of the power-law flux distribution generated by IPR, PPT, PIPR and IPT models (See the original distributions in Fig. S8 in SI). We compare these fits with that of the real data (in high resolution, stay duration threshold T = 3 mins) and the shuffled data. The exponents with relative errors are: -1.81 ± 0.03 (Real data), -1.91 ± 0.04 (Shuffled data), -2.02 ± 0.04 (IPR), -1.38 ± 0.02 (PPT), -1.89 ± 0.03 (PIPR), -1.81 ± 0.03 (IPT). The relative errors are the difference between the maximal and minimal exponents obtained by varying the fitting curves within the 95% confidence interval (see Fig. S8 in SI for the visualization of the zone of the 95% confidential intervals). As we can see, the exponent generated by the PIPR model is very close to that of the real data. However, the exponent generated by the IPT model is close to that of the real data, suggesting that IPT can best reproduce the real flux distribution. In addition, the difference between these exponents are much larger than the relative errors, supporting that the distribution generated by IPT is closest to the real data.
To understand more comprehensively the difference between the IPT and IPR models, we study several additional metrics, with the results summarized in SI note 5. At individual level, we examine three other metrics including the number n pair α of transited location pairs, the variance Var α of the transited pairs' usage frequency, and the distance d loop α of maximum loop, as presented in Fig. S9. While IPR can reproduce the number of transited location pairs similar to that in the real data, it underestimates Var α , and overestimates d loop α . In Fig. S10, we study another metric at the collective level, namely the distribution F(k i ) of the number of different locations that users travel to starting from location i. A longer tail generated by the IPR model indicates that IPR would overestimate the number of different locations that users travel to originated from a specific location. IPT outperforms IPR in reproducing these metrics at both individual and collective levels.
We finally simulate respectively the IPR and the IPT models in a finite space of M locations with no initial memory, in which N = 6 × 10 4 individuals move s steps (with M and s randomly drawn from [2,350] and [50, 800] respectively). All f α:i→j (t) in the IPT model and f α:j (t) in the IPR model for individual α are set to be the same small value initially (i.e., f α:i→j (t) = 1 and f α:j (t) = 1 for simplicity) and then updated during the process (see details in SI note 5). The results suggest that IPT outperforms IPR in reproducing the observed mobility patterns in the real data, even without the initial memory from the empirical data, see Fig. S11. Specifically, the simulated data from IPT has a smaller number of unique paths and a larger variance of the usage frequency of paths than the corresponding shuffled data, indicating that individuals in IPT tend to use a small number of paths repeatedly. Taken together, the IPT model, integrated with quantities extracted from the comprehensive cell-phone position dataset, can well reproduce human mobility patterns with high temporal distribution that other models fail to capture.

Discussion
To summarize, we presented a comprehensive study of human mobility patterns in different temporal scales with a large sample of 4G cell phone data where the positions of users are recorded in each second. We construct mobility networks of mobile phone users, and compare real mobility networks with randomly shuffled networks. We find that the shuffled networks overestimate largely the total number of transited location pairs and the total traveled distance at short time-scale. The collective statistics such as the population flux between locations are also overestimated. This is due to the fact that in the high resolution human mobility data individuals exhibit clear preference on transitions between locations, which is determined by the frequency of the routes that have been used before.
We finally study a simple model based on the first-order Markov process (called individual preferential transition) where the preference of users on paths are accumulated in a matrix and users move according to their preferred paths. The model can quantitatively reproduce the empirical travel patterns at both the individual and population levels up to the high temporal resolution of our empirical data.
Promising future directions include improving the model by introducing the decay of the preference on paths with time, which will result in a more realistic model where the frequently used paths of an individual evolve. In addition, one can empirically study the path preference matrix of individuals, which provides clues to various human mobility behaviors such as explorers and returners observed at the population level in the literatures [16]. Other directions include extending the present work to multiple spatial scales across cities or even countries [6,18]. The ultimate goal is to obtain a universal model that can be applied to explain the individual and collective human mobility patterns at different spatial and temporal scales. From the perspective of applications, one can study the overlap of users' preference in traveling paths in order to understand and suppress traffic congestion. Answering these questions would not only offer a better understanding of the fundamental mechanisms that underpin individual human mobility, but may also substantially improve our ability to predict and control collective traffic flux [41].
Finally, we remark that our findings can be put in the broader context of complex dynamical systems. The power-law distribution of the flow along edges, and similarly, the staytime distribution were also studied in the preferential behaviour and scaling in diffusive dynamical systems on networks [42]. The mathematical framework of the discrete-time absorbing Markov chain is also connected to the production optimization in economy [43]. We hope that our findings can inspire new observations and new models in these complex dynamical systems.