Downscaling spatial interaction with socioeconomic attributes

Tang, Chengling; Dong, Lei; Guo, Hao; Wang, Xuechen; Chen, Xiao-Jian; Dong, Quanhua; Liu, Yu

doi:10.1140/epjds/s13688-024-00487-w

Research
Open access
Published: 05 July 2024

Downscaling spatial interaction with socioeconomic attributes

Chengling Tang¹,
Lei Dong ORCID: orcid.org/0000-0002-1615-5424^1,2,
Hao Guo¹,
Xuechen Wang¹,
Xiao-Jian Chen¹,
Quanhua Dong¹ &
…
Yu Liu^1,2

EPJ Data Science volume 13, Article number: 46 (2024) Cite this article

222 Accesses
2 Altmetric
Metrics details

Abstract

A variety of complex socioeconomic phenomena, for example, migration, commuting, and trade can be abstracted by spatial interaction networks, where nodes represent geographic locations and weighted edges convey the interaction and its strength. However, obtaining fine-grained spatial interaction data is very challenging in practice due to limitations in collection methods and costs, so spatial interaction data such as transportation data and trade data are often only available at a coarse scale. Here, we propose a gravity downscaling (GD) method based on readily accessible socioeconomic data and the gravity law to infer fine-grained interactions from coarse-grained data. GD assumes that interactions of different spatial scales are governed by the similar gravity law and thus can transfer the parameters estimated from coarse-grained regions to fine-grained regions. Results show that GD has an average improvement of 24.6% in Mean Absolute Percentage Error over alternative downscaling methods (i.e., the areal-weighted method and machine learning models) across datasets with different spatial scales and in various regions. Using simple assumptions, GD enables accurate downscaling of spatial interactions, making it applicable to a wide range of fields, including human mobility, transportation, and trade.

1 Introduction

Recently, spatial interaction patterns between regions have attracted wide attention in scientific communities [1–3]. Examples can be found in a variety of domains, from the flow of movement in transportation networks [4, 5] to the spread of epidemics in cities [6, 7] and, more generally, human mobility in urban systems [8–10]. However, fine-grained spatial interaction data representing detailed human activities are difficult to access [11–13]. The primary reason is the high costs associated with collecting detailed spatial interactions. As the spatial resolution doubles, the data volume of spatial interaction expands exponentially, limiting data collection to a few important regions. For example, traffic flows are typically limited to major roads, and trade flows are collected at major checkpoints. Additionally, due to privacy reasons, companies must comply with regulations that prohibit the disclosure of personal or granular spatial interaction data due to privacy reasons. Therefore, even though many companies that provide location-based services can collect fine-grained spatial interaction data, researchers still have limited access to these datasets, which has become a major obstacle to a large number of geography-related applications, such as traffic flow prediction, disease transmission modeling, and tourism planning [14–16].

Although fine-grained interaction data are not easy to obtain, coarse-grained data are relatively easy to acquire, highlighting the importance of finding a feasible method to downscale the spatial interaction to overcome the resolution limitation. Here, spatial interaction downscaling refers to transforming spatial flows from coarse-grained to finer-grained regions.

Researchers have devoted considerable efforts to estimating granular flow data in recent decades. On the one hand, several classic flow interpolation methods have been proposed to estimate flows between two different spatial zoning schemes [17, 18], from census tracts to Traffic Analysis Zones (TAZs) for example. In these methods, each flow between census tracts from A to B is calculated as the weighted sum of flows between all TAZs overlapping with A or B. The weights are determined by the administrative area or built-up area ratio of the census tract’s overlapping parts to the parent TAZ [17]. On the other hand, machine learning methods are employed to generate flows between locations using multiple geographic features, including population, land uses, roads, and distances [19–21]. While flow interpolation methods achieve promising results in interpolating zoning schemes with similar scales, they demonstrate limited accuracy in downscaling tasks [17]. Flow generation models based on machine learning methods may not perform well on downscaling tasks either, as these models strictly depend on training data and may not be geographically transferable [22] (e.g., from coarse-grained regions to fine-grained regions).

In this work, we propose a gravity downscaling (GD) method for spatial interactions based on the key assumption that interactions are governed by the similar gravity law at different spatial scales [23]. This assumption is partially supported by previous research on (spatial or geometrical) networks, which exhibit self-repeating patterns across scales [24–26]. Furthermore, the spatial interaction can be well represented by spatial networks [27]. We can, therefore, estimate the parameters of the gravity model with the coarse-grained region attributes and use these parameters for the fine-grained scales. The detailed procedure is shown in Fig. 1.

To validate the proposed method, we use three datasets (two cellphone datasets and one taxi trajectory dataset) with varying scales, regions, and types of spatial interactions. The commonly used areal-weighted flow interpolation method, EXtreme Gradient Boosting (XGBoost) [28], and Deep Neural Network (DNN) are used as benchmarks. Overall, our method has achieved improvements of up to 66.9% on the cellphone datasets and 67.3% on the taxi trajectory dataset compared to the benchmark methods. Moreover, GD demonstrates excellent generalization capabilities for downscaling tasks of diverse scales, whether from the city level to the county level or from the county level to the sub-district level. Additionally, we highlight that GD maintains relatively high accuracy even when confronted with limited attribute (only population), rendering it particularly valuable in data-scarce situations.

Based on simple assumptions and easily accessible data, our approach demonstrates high accuracy and transferability in estimating fine-grained interactions from coarse-grained interactions. This approach holds substantial value and applicability across multiple fields, including human mobility analysis, transportation planning, urban accessibility assessment, and trade analysis. Furthermore, comparing the classic gravity model with machine learning methods provides valuable insights for advancing research in geospatial data science and related fields.

2 Methods and data

2.1 Model

The gravity model, one of the most commonly used spatial interaction models [23, 29], assumes that the number of individuals that move between two locations per unit of time is proportional to some power of the population of the source and destination locations, and decays with the distance between them. Compared with other spatial interaction models, the gravity model excels in estimating flows, effectively preserving the structure of the spatial interaction network while accurately fitting the distribution of flow distances [30, 31]. Numerous studies have shown that flows between each pair of regions can be well estimated with the enriched gravity model, by extending the population terms to more socioeconomic attributes [32–34]:

$$ \ln T_{mn}=\ln k+{\alpha _{1}}\ln {M_{m}^{1}} +\cdots+ {\alpha _{b}}\ln {M_{m}^{b}} + {\beta _{1}}\ln {M_{n}^{1}} +\cdots+ {\beta _{b}}\ln {M_{n}^{b}} -{ \gamma}\ln d_{mn} $$

(1)

where $T_{mn}$ denotes the interaction intensity between region m and region n ($m{\neq}n$ since self-interaction is not considered in this study). Each region is supposed to have b attributes, such as population and gross domestic product (GDP), and $M_{m}^{x}$ represents the $x^{th}$ attribute of the region m. ${\alpha _{1}},\ldots,{\alpha _{b}}$ and ${\beta _{1}},\ldots,{\beta _{b}}$ are the parameters to be estimated. Here, we use a power law distance decay function with an exponent of γ [35, 36]. To eliminate unnecessary variables and establish a more concise model, we apply the step-wise regression [37–39] to estimate the parameters ${\alpha _{1}},\ldots,{\alpha _{b}}$, ${\beta _{1}},\ldots,{\beta _{b}}$, γ and k from the coarse-grained data. Then, those parameters are used to infer the fine-grained interactions:

$$ \ln \hat{T}_{ij}=\ln k+{\alpha _{1}}\ln {M_{i}^{1}} +\cdots+ {\alpha _{b}} \ln {M_{i}^{b}} + {\beta _{1}}\ln {M_{j}^{1}} +\cdots+ {\beta _{b}}\ln {M_{j}^{b}} -{\gamma}\ln d_{ij} $$

(2)

where $\hat{T}_{ij}$ is the estimated flow intensity between fine-grained region i and region j. To ensure that the sum of fine-grained flows $\sum _{i \in m} \sum _{j \in n} T_{ij}^{pred}$ is equal to corresponding parent coarse-grained flows $T_{mn}$, the results are calibrated by

$$ T_{ij}^{pred} = \hat{T}_{ij} \frac{T_{mn}}{\sum _{i \in m} \sum _{j \in n} \hat{T}_{ij}}. $$

(3)

2.2 Model evaluation

2.2.1 Evaluation metrics

To evaluate the results, we adopt five metrics to assess the performance of our model: Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE) are used to measure the degree of deviation between estimated values and true values, with smaller values indicating better accuracy. Common Part of Commuters (CPC) is a similarity measure, with values ranging from 0 to 1, where a value closer to 1 indicates higher similarity. The formulas for calculating the metrics are as follows:

$$\begin{aligned}& \text{RMSE}=\sqrt{\frac{1}{n}\sum _{i,j}({T_{ij}^{pred}}-T_{ij})^{2}} \end{aligned}$$

(4a)

$$\begin{aligned}& \text{MAPE}=\frac{1}{n} \sum _{i,j}\left | \frac{{T_{ij}^{pred}}-T_{ij}}{T_{ij}} \right | \end{aligned}$$

(4b)

$$\begin{aligned}& \text{CPC}= \frac{2\sum _{ij}\mathrm{min}({T_{ij}^{pred}},T_{ij})}{\sum _{ij}{T_{ij}^{pred}}+\sum _{ij}{T_{ij}}} \end{aligned}$$

(4c)

Two network metrics are used from the complex network perspective [40]. One is the weighted degree centrality [41], and the other is the weighted clustering coefficient. Weighted degree centrality, considering both the number of connections (degree) and the strength of those connections (weights), indicates nodes’ importance within a network. Weighted degree centrality can be calculated by $\sum _{j} T_{ij} $, where $T_{ij}$ is the flow intensity. The weighted clustering coefficient gauges the tendency of nodes in a network to form clusters [42]. The weighted clustering coefficient can be computed as $\frac{1}{k_{i}(k_{i}-1)}\sum _{kj}{(T^{\prime }_{ik}T^{\prime }_{ij}T^{\prime }_{kj})^{ \frac{1}{3}}}$ [43], with $k_{i}$ representing the number of neighbors of region i and $T^{\prime }_{ij}$ representing the normalized flow intensity of $\frac{T_{ij}}{\text{max}(T_{ij})}$.

2.2.2 Baselines

Areal-weighted flow interpolation:

The areal-weighted method is a traditional approach for flow interpolation that considers the area of regions as weights to distribute flows [17]. When employed for flow downscaling, the method consists of two steps: firstly, the weight used for a fine-grained region is computed based on the area ratio of the region to the coarse-grained region that contains it; then, the flow between fine-grained regions is distributed from the corresponding flow between coarse-grained regions according to the weights of its origin and destination regions. To be exact, we can get the flow intensity from $A_{1}$ to $B_{1}$ in Fig. 1 by $T_{AB} \times \frac{S_{A_{1}}}{S_{A}} \frac{S_{B_{1}}}{S_{B}}$ where $S_{A}$ and $S_{B}$ denotes the administrative area of A and B.

XGBoost:

XGBoost is widely used in machine learning tasks. In this study, we directly concatenate the features of the origin area, the features of the destination area, and the distance between them as the inputs to scale down the flow. The training set consists of all coarse-grained flows, while the test set consists of all fine-grained flows. As machine learning requires sufficient input features to ensure the accuracy and generalization of the model [44, 45], we use multiple variables including population, GDP, GDP in the primary, secondary, and tertiary industries, GDP index, administrative area, and built-up area as the inputs.

DNN

The DNN used here is similar to Deep Gravity [20], with 15 hidden layers of dimensions 256 (first 6 layers) and 128 (last 9 layers), and the activation function is LeakyReLU (with a parameter of 0.7). This model has the same inputs as the XGBoost and, after training with coarse-grained interaction for 3000 epochs, is used to estimate flow in fine-grained regions.

2.3 Datasets

Cellphone data of Guangdong

The population flow data are extracted from cellphone location data of 5 million individuals in Guangdong Province, China. The dataset spans from November 1, 2020 to November 15, 2020. A threshold of 30 minutes is set to identify the stay points for each anonymous user, with trips being defined as movement between consecutive stay points. Subsequently, we construct an origin-destination (OD) flow using these trips on a $500\text{ m}\times 500\text{ m}$ grid. Finally, we aggregate daily flow at three levels: city level (prefecture-level city), county level, and sub-district level. There are a total of 21 cities, 124 counties, and 2,100 sub-districts in the study area. The socioeconomic data (e.g., population, GDP) of these areas are obtained from various statistical yearbooks in 2021.

Cellphone data of Beijing

Similar to the Guangdong cellphone data, we obtain Beijing flow data at the TAZ level. The dataset spans from May 1, 2019, to May 30, 2019. After selecting 21 weekdays, the commuting data were derived by identifying the locations where each user spent most of their work and rest time, and labeling these locations as home and workplace. There are a total of 331 sub-districts and 1,911 TAZs in Beijing. The sub-district flow is aggregated from TAZ-level data. The population data used to downscale flows are obtained from Worldpop [46].

Taxi trajectory data of Beijing

The taxi flow data are extracted from the taxi trajectory data of Beijing from March 1, 2015, to March 7, 2015. The study area consists of sub-districts located within the Fifth Ring Road of Beijing (5RBJ), as it is the most active region for taxis. The pick-up and drop-off points of each taxi trajectory are selected as the origin and destination. Then, the taxi flow is aggregated at county, sub-district, and TAZ levels. Same with cellphone data of Beijing, the population data used for flow downscaling are also obtained from Worldpop [46].

Additional file 1 Fig. S1 visualizes the datasets used in experiments.

3 Results

3.1 Model performance

We first compare the results of different methods on Guangdong cellphone data and then use the cellphone and taxi data of Beijing to verify the generalization of our method. Here, the baseline models are fed with the same data as GD. As shown in Fig. 2, our model performs far better than the areal-weighted method and outperforms the machine learning methods XGBoost and DNN. Specifically, from the city level to the county level, our gravity downscaling method with multiple variables achieves RMSE improvements of 29.2% and 45.9% when compared to the areal-weighted flow interpolation method and DNN, respectively. To investigate the reasons for this outcome, we present the results of models fitted at the city level (refer to Table S1). Although both the DNN and XGBoost models demonstrate strong performance, their performance at the county level is inferior to that of GD, indicating a probable case of overfitting. Note that our approach outperforms the baseline when using multiple socioeconomic variables, but in many cases, the availability of socioeconomic data is limited. Therefore, we further investigate the effectiveness of our method by shrinking the number of socioeconomic variables. Figure 2 shows that using only population or GDP can yield satisfactory results, outperforming other baseline models by an average decrease of 17.5% in RMSE and an average improvement of 23.6% in CPC, indicating our method has better practical value.

To gain a deeper understanding of different models, we plot the prediction results and highlight the long-range interactions (results above 100 km with colored dots) in Fig. 3. It can be observed that our method not only achieves the most accurate overall estimation results (gray dots), but also performs better in estimating long-range fluxes (colored dots). In contrast, the areal-weighted method and DNN overestimate long-range flows. The histograms on the right side of each scatter plot investigate the distribution of flows within the predicted results (colored) and ground truth (gray).

For further comparison between the network structure of actual data and the downscaled results, we calculated the weighted degree centrality (Fig. 4 (a)) and weighted clustering coefficient (Fig. 4 (b)) of each region. In terms of $R^{2}$, GD can better approximate the distribution of the ground truth. Its performance may be attributed to the model’s assumption of self-similarity, enabling the preservation of certain properties of the graph across different scales. Areal-weighted estimation overestimates the weighted clustering coefficients of various regions, possibly due to the neglect of long-range inhibitory effects on human mobility.

Figure 5 shows the actual data, the downscaled outcomes, and the percentage of absolute error ($\frac{|{F}_{ij}^{pred}- F_{ij}|}{F_{ij}}$). The results of our methods are visually closer to the true distribution, outperforming the areal-weighted method, XGboost, and DNN. In terms of overall absolute error percentage, our method has smaller errors compared to the other two methods. Additionally, the areal-weighted interpolation method overestimates the flows in the periphery of Guangdong (Fig. 5(c)), where counties are geographically distant from the main cities (i.e., Shenzhen and Guangzhou). This overestimation may be largely due to the fact that the areal-weighted interpolation method only considers the area proportion and does not take into account the distance between regions and population, the two most important factors affecting spatial interaction [47].

3.2 Model generalization

To demonstrate the generalization of the gravity downscaling method, we further conducted tasks using cellphone data and taxi trajectory data of Beijing. Table 1 shows that our method outperforms the areal-weighted flow interpolation method. In the Beijing cellphone dataset, the CPC increased by 73.9%, 10.1%, and 60.2% for the three scales, respectively; in the Beijing taxi trajectory dataset, the CPC for the three scales increased by 22.6%, 13.0%, and 67.3%, respectively.

Table 1 Spatial downscaling result’s evaluation at different scales

Full size table

Since administrative boundaries are hierarchically organized, we test the downscaling performance between different administrative levels and find that accuracy is relatively high when the source and target scales are adjacent administrative levels. For example, the downscaling results from the county level to the sub-district level show higher performance than from the city level to the sub-district level. This may be attributed to the availability of county-level flow data, which offers more detailed flow intensity volume for calibration in Eq. (3). To further explore the specific reasons, we attempted to analyze the results of GD without calibration, and the corresponding data can be found in Table 2. It can be observed that the improvement of CPC from the country level to the sub-district level and from the city level to the sub-district level only increased by 2.5%, instead of 5.6% after calibration. This indicates that having finer-grained data for calibration can improve accuracy.

Table 2 GD without calibration result’s evaluation at different scales

Full size table

To provide a rough estimation of the applicability of GD, we consider the number of subregions within the parent region (N̅ in Table 1), which can represent the downscaling factor. Based on our results, we recommend using GD for tasks within a downscaling factor of several tens. An excessively large downscaling span (N̅ >100) tends to yield suboptimal results.

Figure 6 shows the ground truth of spatial interactions and the downscaling results of Beijing cellphone data and taxi data. The process of downscaling from the sub-district level to the TAZ level has yielded better results on both datasets than that from the county level to other levels. The low accuracy of downscaling from the county level may result from the inadequacy of information obtained at this level for predicting local hotspots characterized by low population density but high total flows, such as transportation hubs and tourist attractions.

4 Conclusion and discussion

To overcome the challenge of obtaining fine-grained spatial interaction data, we propose a method to downscale coarse-grained spatial interaction with accessible socioeconomic data. Based on the assumption that interactions are governed by the gravity law at different spatial scales, we aim to achieve spatial interaction downscaling by transferring parameters estimated from coarse-grained regions to fine-grained regions. Our method has been proven to be effective on several empirical datasets and simulation experiments, with an average improvement of 24.6% in RMSE over benchmarks across all datasets and scales in the experiments.

The spatial configurations (i.e., population and flow distributions) in the empirical datasets remain relatively stable. To evaluate the impact of different spatial configurations and model parameters on GD, we conduct experiments using simulated data and present the outcomes in Additional file 1 (Simulated experiments). Results from the simulation experiments indicate that although the distance decay effect and spatial autocorrelation could potentially impact GD’s performance, in the vast majority of practical scenarios, GD maintains a relatively high level of accuracy.

In addition to the gravity model, we have examined the generalized radiation model and the universal opportunity model to perform downscaling tasks from the city level to the county level using cellphone data from Guangdong. From the results, we found that the gravity model exhibited superior performance (Table 3).

Table 3 Downscaling results from city to county level in Guangdong based on different models

Full size table

There are still some limitations in this study. First, we only present population flow results in the main text. It is worth noting that there are various forms of spatial interactions, including cargo, telecommunication, financial networks, and so on. The gravity model could be extended to different types of flows. In the Additional file 1 (Supplementary datasets and results), we validate our method by utilizing the Baidu search index as a representation of information flow. Exploring the gravity model’s potential for downscaling other forms of spatial interactions remains a promising direction. Second, our model achieves relatively poor accuracy when the disparity between the source and target scales is substantial, which may be due to the fact that the distance decay parameters/functions of spatial interactions vary across different scale regimes [35, 36]. One possible approach to address this issue is to calibrate the distance decay parameter γ based on empirical observations. We also explore this parameter calibration method and find that it may not effectively improve accuracy (Additional file 1 Parameters calibration).

In summary, our method can potentially overcome the limitations of accessing fine-grained spatial interaction data, thereby holding substantial value and applicability. Furthermore, the superior performance of our methods, which are primarily based on the classic gravity model rather than machine learning methods, also sheds light on model development in the era of geospatial artificial intelligence (GeoAI) [48].

Data availability

Cellphone data and taxi data are not publicly available to preserve privacy. Aggregated flow data (TAZs, subdistricts, counties, and cities) can be requested from the corresponding author to reproduce the results of this study. The population data used in this study are from the Worldpop [46], which is publicly available at https://www.worldpop.org/. The Baidu Search Index data are available at https://github.com/s3pku/BaiduCityAttr. The code for GD is available at https://github.com/Elvira1021/Gravity_downscaling_method_for_spatial_interaction

References

Hayes MC, Wilson AG (1971) Spatial interaction. Socio-Econ Plan Sci 5(1):73–95. https://doi.org/10.1016/0038-0121(71)90042-5
Article Google Scholar
Tobler W (1975) Spatial interaction patterns. J Environ Syst 6(4):271–301
Article Google Scholar
Ullman EL, Boyce RR, Harris CD (1980) Geography as spatial interaction. University of Washington Press, Seattle
Google Scholar
Yan X, Wang W, Gao Z, Lai Y (2017) Universal model of individual and population mobility on diverse spatial scales. Nat Commun 8(1):1639. https://doi.org/10.1038/s41467-017-01892-8
Article Google Scholar
Huang J, Levinson D, Wang J, Jin H (2019) Job-worker spatial dynamics in Beijing: insights from smart card data. Cities 86:83–93. https://doi.org/10.1016/j.cities.2018.11.021
Article Google Scholar
Yuan H-Y, Hossain MP, Tsegaye M, Zhu X, Jia P, Junus A, Wen T-H, Pfeiffer D (2020) Estimating the risk on outbreak spreading of 2019-nCoV in China using transportation data, 2020–02. https://doi.org/10.1101/2020.02.01.20019984. medRxiv
Pollmann TR, Schönert S, Müller J, Pollmann J, Resconi E, Wiesinger C, Haack C, Shtembari L, Turcati A, Neumair B et al. (2021) The impact of digital contact tracing on the SARS-CoV-2 pandemic—a comprehensive modelling study. EPJ Data Sci 10(1):37. https://doi.org/10.1140/epjds/s13688-021-00290-x
Article Google Scholar
Tao H, Wang K, Zhuo L, Li X (2019) Re-examining urban region and inferring regional function based on spatial-temporal interaction. Int J Digit Earth 12(3):293–310. https://doi.org/10.1080/17538947.2018.1425490
Article Google Scholar
Zhu D, Zhang F, Wang S, Wang Y, Cheng X, Huang Z, Liu Y (2020) Understanding place characteristics in geographic contexts through graph convolutional neural networks. Ann Assoc Am Geogr 110(2):408–420. https://doi.org/10.1080/24694452.2019.1694403
Article Google Scholar
Guo H, Zhang W, Du H, Kang C, Liu Y (2022) Understanding China’s urban system evolution from web search index data. EPJ Data Sci 11(1):20. https://doi.org/10.1080/24694452.2019.1694403
Article Google Scholar
Pedrycz W, Chen S (2014) Information granularity, big data and computational intelligence, vol 8. Springer, Cham. https://doi.org/10.1007/978-3-319-08254-7
Book Google Scholar
Voigt P, Von Dem Bussche A (2017) The EU general data protection regulation (GDPR). Springer, Cham. https://doi.org/10.1007/978-3-319-57959-7
Book Google Scholar
Liu Y, Gao S, Yuan Y, Zhang F, Kang C, Kang Y, Wang K (2021) Methods of social sensing for urban studies. In: Urban remote sensing: monitoring, synthesis, and modeling in the urban environment, pp 71–89. https://doi.org/10.1002/9781119625865.ch4
Chapter Google Scholar
Mizzi C, Fabbri A, Rambaldi S, Bertini F, Curti N, Sinigardi S, Luzi R, Venturi G, Davide M, Muratore G et al. (2018) Unraveling pedestrian mobility on a road network using ICTs data during great tourist events. EPJ Data Sci 7(1):44. https://doi.org/10.1140/epjds/s13688-018-0168-2
Article Google Scholar
Ouyang K, Liang Y, Liu Y, Tong Z, Ruan S, Zheng Y, Rosenblum DS (2022) Fine-grained urban flow inference. IEEE Trans Knowl Data Eng 34(6):2755–2770. https://doi.org/10.1109/TKDE.2020.3017104
Article Google Scholar
Cardia M, Luca M, Pappalardo L (2022) Enhancing crowd flow prediction in various spatial and temporal granularities. In: Companion proceedings of the web conference 2022, pp 1251–1259. https://doi.org/10.1145/3487553.3524851
Chapter Google Scholar
Jang W, Yao X (2011) Interpolating spatial interaction data. Trans GIS 15(4):541–555. https://doi.org/10.1111/j.1467-9671.2011.01273.x
Article Google Scholar
Šimbera J, Aasa A (2019) Areal interpolation of spatial interaction data. In: LBS 2019; adjunct proceedings of the 15th international conference on location-based services/Gartner, Georg; Huang, Haosheng, Wien
Google Scholar
Liu Z, Miranda F, Xiong W, Yang J, Wang Q, Silva C (2020) Learning geo-contextual embeddings for commuting flow prediction. Proc AAAI Conf Artif Intell 34(1):808–816. https://doi.org/10.1609/aaai.v34i01.5425
Article Google Scholar
Simini F, Barlacchi G, Luca M, Pappalardo L (2021) A deep gravity model for mobility flows generation. Nat Commun 12(1):6576. https://doi.org/10.1038/s41467-021-26752-4
Article Google Scholar
Mauro G, Luca M, Longa A, Lepri B, Pappalardo L (2022) Generating mobility networks with generative adversarial networks. EPJ Data Sci 11(1):58. https://doi.org/10.1140/epjds/s13688-022-00372-4
Article Google Scholar
Luca M, Barlacchi G, Lepri B, Pappalardo L (2021) A survey on deep learning for human mobility. ACM Comput Surv 55(1):7–1744. https://doi.org/10.1145/3485125
Article Google Scholar
Anderson JE (2011) The gravity model. Annu Rev Econ 3(1):133–160. https://doi.org/10.1146/annurev-economics-111809-125114
Article Google Scholar
Song C, Havlin S, Makse HA (2005) Self-similarity of complex networks. Nature 433(7024):392–395. https://doi.org/10.1038/nature03248
Article Google Scholar
Alessandretti L, Aslak U, Lehmann S (2020) The scales of human mobility. Nature 587(7834):402–407. https://doi.org/10.1038/s41586-020-2909-1
Article Google Scholar
Boguñá M, Bonamassa I, De Domenico M, Havlin S, Krioukov D, Serrano MÁ (2021) Network geometry. Nat Rev Phys 3(2):114–135. https://doi.org/10.1038/s42254-020-00264-4
Article Google Scholar
Barbosa H, Barthelemy M, Ghoshal G, James CR, Lenormand M, Louail T, Menezes R, Ramasco JJ, Simini F, Tomasini M (2018) Human mobility: Models and applications. Phys Rep 734:1–74. https://doi.org/10.1016/j.physrep.2018.01.001
Article MathSciNet Google Scholar
Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16. ACM, New York, pp 785–794. https://doi.org/10.1145/2939672.2939785
Chapter Google Scholar
Ravenstein EG (1889) The laws of migration. J R Stat Soc 52(2):241–305
Article Google Scholar
Lenormand M, Bassolas A, Ramasco JJ (2016) Systematic comparison of trip distribution laws and models. J Transp Geogr 51:158–169. https://doi.org/10.1016/j.jtrangeo.2015.12.008
Article Google Scholar
Stefanouli M, Polyzos S (2017) Gravity vs radiation model: Two approaches on commuting in Greece. Transp Res Proc 24:65–72. https://doi.org/10.1016/j.trpro.2017.05.069
Article Google Scholar
Gil-Pareja S, Llorca-Vivero R, Martínez-Serrano JA (2007) The impact of embassies and consulates on tourism. Tour Manag 28(2):355–360. https://doi.org/10.1016/j.tourman.2006.04.016
Article Google Scholar
Eryiğit M, Kotil E, Eryiğit R (2010) Factors affecting international tourism flows to Turkey: A gravity model approach. Tour Econ 16(3):585–595. https://doi.org/10.5367/000000010792278374
Article Google Scholar
Shen J (2015) Explaining interregional migration changes in China, 1985–2000, using a decomposition approach. Reg Stud 49(7):1176–1192. https://doi.org/10.1080/00343404.2013.812783
Article Google Scholar
Liu Y, Gong L, Tong Q (2014) Quantifying the distance effect in spatial interactions. Acta Sci Nat Univ Pek 50(3):526–534. https://doi.org/10.13209/j.0479-8023.2014.051
Article Google Scholar
Chen Y (2015) The distance-decay function of geographical gravity model: Power law or exponential law? Chaos Solitons Fractals 77:174–189. https://doi.org/10.1016/j.chaos.2015.05.022
Article MathSciNet Google Scholar
Efroymson MA (1960) Multiple regression analysis. In: Mathematical methods for digital computers, pp 191–203
Google Scholar
Halinski RS, Feldt LS (1970) The selection of variables in multiple regression analysis. J Educ Meas 7(3):151–157
Article Google Scholar
Pope PT, Webster JT (1972) The use of an F-statistic in stepwise regression procedures. Technometrics 14(2):327–340
Google Scholar
Barthélemy M (2011) Spatial networks. Phys Rep 499(1–3):1–101. https://doi.org/10.1016/j.physrep.2010.11.002.
Article MathSciNet Google Scholar
Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: Generalizing degree and shortest paths. Soc Netw 32(3):245–251. https://doi.org/10.1016/j.socnet.2010.03.006
Article Google Scholar
Saramäki J, Kivelä M, Onnela J-P, Kaski K, Kertesz J (2007) Generalizations of the clustering coefficient to weighted complex networks. Phys Rev E 75(2):027105. https://doi.org/10.1103/PhysRevE.75.027105
Article Google Scholar
Onnela J-P, Saramäki J, Kertész J, Kaski K (2005) Intensity and coherence of motifs in weighted complex networks. Phys Rev E 71(6):065103. https://doi.org/10.1103/PhysRevE.71.065103
Article Google Scholar
Hall MA (1999) Correlation-based feature selection for machine learning. Thesis, The University of Waikato
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: A data perspective. ACM Comput Surv 50(6):94–19445. https://doi.org/10.1145/3136625
Article Google Scholar
Stevens FR, Gaughan AE, Linard C, Tatem AJ (2015) Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLoS ONE 10(2):0107042. https://doi.org/10.1371/journal.pone.0107042
Article Google Scholar
Roy JR, Thill J-C (2003) Spatial interaction modelling. Pap Reg Sci 83(1):339–361. https://doi.org/10.1007/s10110-003-0189-4
Article Google Scholar
Janowicz K, Gao S, McKenzie G, Hu Y, Bhaduri B (2020) GeoAI: spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond. Int J Geogr Inf Sci 34(4):625–636. https://doi.org/10.1080/13658816.2019.1684500
Article Google Scholar

Download references

Acknowledgements

The authors thank the two anonymous reviewers for their valuable suggestions. The authors also thank Yuanqiao Hou and Tianyou Cheng for their assistance in processing the Guangdong cellphone data.

Funding

This research was supported by grants from the National Natural Science Foundation of China (41830645) and the Fundamental Research Funds for the Central Universities, Peking University.

Author information

Authors and Affiliations

Institute of Remote Sensing and Geographical Information Systems, School of Earth and Space Sciences, Peking University, Beijing, China
Chengling Tang, Lei Dong, Hao Guo, Xuechen Wang, Xiao-Jian Chen, Quanhua Dong & Yu Liu
Ordos Research Institute of Energy, Peking University, Inner Mongolia, China
Lei Dong & Yu Liu

Authors

Chengling Tang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Dong
View author publications
You can also search for this author in PubMed Google Scholar
Hao Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xuechen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Jian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Quanhua Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

CT: Conceptualization, Methodology, Writing – original draft, review & editing. LD: Supervision, Conceptualization, Methodology, Writing – review & editing. HG: Conceptualization, Methodology. XW: Methodology, Writing – review & editing. XC: Writing – review & editing. QD: Writing – review & editing. YL: Supervision, Conceptualization, Writing – review. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lei Dong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

This includes detailed results of simulation experiments, as well as supplementary datasets and results. (PDF 5.1 MB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tang, C., Dong, L., Guo, H. et al. Downscaling spatial interaction with socioeconomic attributes. EPJ Data Sci. 13, 46 (2024). https://doi.org/10.1140/epjds/s13688-024-00487-w

Download citation

Received: 28 November 2023
Accepted: 24 June 2024
Published: 05 July 2024
DOI: https://doi.org/10.1140/epjds/s13688-024-00487-w

Downscaling spatial interaction with socioeconomic attributes

Abstract

1 Introduction

2 Methods and data

2.1 Model

2.2 Model evaluation

2.2.1 Evaluation metrics

2.2.2 Baselines

Areal-weighted flow interpolation:

XGBoost:

DNN

2.3 Datasets

Cellphone data of Guangdong

Cellphone data of Beijing

Taxi trajectory data of Beijing

3 Results

3.1 Model performance

3.2 Model generalization

4 Conclusion and discussion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Supplementary Information

This includes detailed results of simulation experiments, as well as supplementary datasets and results. (PDF 5.1 MB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords