Skip to main content

Human-network regions as effective geographic units for disease mitigation

Abstract

Susceptibility to infectious diseases such as COVID-19 depends on how those diseases spread. Many studies have examined the decrease in COVID-19 spread due to reduction in travel. However, less is known about how much functional geographic regions, which capture natural movements and social interactions, limit the spread of COVID-19. To determine boundaries between functional regions, we apply community-detection algorithms to large networks of mobility and social-media connections to construct geographic regions that reflect natural human movement and relationships at the county level in the coterminous United States. We measure COVID-19 case counts, case rates, and case-rate variations across adjacent counties and examine how often COVID-19 crosses the boundaries of these functional regions. We find that regions that we construct using GPS-trace networks and especially commute networks have the lowest COVID-19 case rates along the boundaries, so these regions may reflect natural partitions in COVID-19 transmission. Conversely, regions that we construct from geolocated Facebook friendships and Twitter connections yield less effective partitions. Our analysis reveals that regions that are derived from movement flows are more appropriate geographic units than states for making policy decisions about opening areas for activity, assessing vulnerability of populations, and allocating resources. Our insights are also relevant for policy decisions and public messaging in future emergency situations.

1 Introduction

1.1 Motivation

Coronavirus disease 2019 (COVID-19) has caused over 1,100,000 deaths and more than 100 million infections in the United States and over 6.9 million deaths and more than 750 million cases worldwide [17, 61]. Although vaccines help mitigate the harmful effects of the disease, in 2020 and early in 2021, non-pharmaceutical interventions (NPIs) were the primary method to protect individuals from exposure to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes COVID-19. These interventions included formal shelter-in-place rules and guidelines, facility closures, limited seating in restaurants, reduction of interactions through physical distancing (i.e., ‘social distancing’), and travel-restriction policies to reduce mobility and transmission [20]. Such NPIs were generally effective [57].

In the United States, policies that invoke NPIs are typically administered at the state level. This has resulted in friction in local communities that seek to impose different (and often more stringent) standards than other areas of their state [3, 16, 29, 42]. To avoid spillovers of COVID-19, people were strongly advised (or even required) to stay within a local region for their daily activities [18, 41, 69]. However, when a local region (e.g., a metropolitan area) spans multiple states or experiences different risk levels than the rest of a state, it may be subject to conflicting policies and guidelines. To obtain administrative units for which it is reasonable to apply homogeneous NPI policies, we seek to construct regions that capture core geographies of social and movement behavior. We expect the spread of COVID-19 across these regions to be less pronounced than its spread across states [68, 77].

1.2 Background and related work

The objective of defining ‘functional’ geographic regions that may not follow administrative boundaries is not new [14]. It has deep theoretical roots in regional science, economic geography, and human geography [48, 60, 62]. Defining regions that are based on news markets, vacation trips, telecommunications, commutes, and migration [13, 33, 36, 54] has been a common practice for decades [25, 26, 33, 39]. More recently, trips from mobile phones and Global Positioning System (GPS) traces, flights, and social-media relationships have been used to define regions [12, 35, 43, 47, 49, 51, 60]. Regardless of the data source, such constructed regions have rarely been implemented in practice for policy purposes.

The COVID-19 pandemic has elicited new arguments for the use of functional regions for policy implementation [1, 6, 15, 32] and new computational experiments to delineate such regions and test whether or not their internal populations experience similar COVID-19 case rates over time. Hou et al. [43] divided two Wisconsin counties into regions using the WalkTrap community-detection algorithm on SafeGraph mobility data. These regions yielded effective boundaries for COVID-19 transmission, with about half of the infections occurring within the regions. Using SafeGraph trip data in California, Chang et al. [18] derived effective regions using a method that was based on minimum k-cuts. In another recent paper, adams et al. [1] defined mobility regions in Colorado using movement data. They concluded that their constructed regions often aligned with the regions in Colorado’s county-based ‘jurisdictional zones’ for COVID-19 policy administration, but with misalignments that may be useful to evaluate potential changes to these regions. Buchel et al. [15] derived regions from SafeGraph data (at the level of census block groups in the U.S.) by detecting communities with modularity maximization. They observed that these regions often cross state borders.

Several researchers have observed that functional regions often persist substantially over time. Using Facebook movement patterns in the United Kingdom, Gibbs et al. [32] detected regions using the InfoMap community-detection algorithm. They found that regions evolved with time but did not change significantly after local authorities invoked NPIs. Using the same data set, Schindler et al. [67] derived communities that generally followed administrative regions but were smaller during periods with travel restrictions. In a study of commute-based regions in Austria, Iacus et al. [44] observed similar within-region rates of COVID-19 infections from week to week, including weeks with lockdown events.

Some models to forecast disease incidences in different geographic areas, such as the GLobal Epidemic and Mobility (GLEaM) model [8], incorporate commuting and flights to simulate connectivity between spaces that are not geographically adjacent. In the context of COVID-19, the GLEaM model has been used to estimate retroactive pathways of transmission that occurred before testing strategies were in place [24].

1.3 Capturing geographic disease dynamics

To assess the ability of functional geographic regions to capture cohesive areas with high COVID-19 case rates, it is desirable to know the transmission patterns of SARS-CoV-2. However, modeling the transmission of COVID-19 infections in networks of individuals is complicated by asymptomatic transmissions and other factors [5]. Phylogenetic strains of SARS-CoV-2 indicate that the virus’s subsequent mutations, such as the Delta variant, initially tended to stay within concentrated geographic regions [38]. However, as mutated variants of SARS-CoV-2 propagated, geographic transmission paths became too widespread to pinpoint.

Contact-tracing technologies that record geographic traces of infected individuals [27] have had mixed results because of underdeveloped technologies, uneven participation levels from individuals, and lack of administrative organization and oversight [42]. Despite a lack of information on the precise spatial transmission of SARS-CoV-2, one can assess how different sets of regional boundaries act as informal barriers to disease transmission. We posit that regions that one obtains from human behavior may help explain the spatiotemporal landscape of COVID-19 case rates (as in [43]).

1.4 Our approach

We investigate the extent to which boundaries that are based on five different human-network regions are able to ‘contain’ COVID-19 cases more effectively—with lower COVID-19 case rates and smaller case counts between regions—than state boundaries in the coterminous United States. We construct the human-network regions by detecting communities in five county-level networks (commutes, GPS-based trips, migration, Twitter connections, and Facebook friendships). The state boundaries correspond to the 48 coterminous states and Washington, D.C., yielding 49 total entities. Our results include (1) descriptive statistics of COVID-19 dynamics (cases, mutual case rates, and case-rate differences) between and within different types of regions, (2) a comparison of actual COVID-19 dynamics in our constructed regions and states to those of a random model of geographically-contiguous regions, and (3) an examination of temporal coordination within regions using Granger causality.

We expect to obtain large case rates within our functional geographic regions, with low transmission activity across regional boundaries. We also investigate whether case rates are more homogeneous within regions than between regions. Because cohesive metropolitan areas often straddle borders, we posit that the region boundaries that we construct from human-mobility dynamics will capture natural disease-transmission bottlenecks more effectively than social-media-based regions or administrative boundaries such as states.Footnote 1

By determining functional geographic regions for the management of the spatial transmission of a disease, we suggest flexible alternatives to using states as administrative units for policy implementation (as also articulated in [18]). Because these proposed alternatives are based on human behavior, they can help limit disease transmission while permitting some natural activity (such as social visits and travel).

1.5 Outline of our paper

Our paper proceeds as follows. In Sect. 2, we discuss the COVID-19 case data sets that we use in our study, our human-behavior networks, and our methods of analysis. In Sect. 3, we describe our results, which detail the types of regions that have the least COVID-19 spread across boundaries, and obtain a set of consensus regions. In Sect. 4, we summarize our work, discuss the implications of our work in the context of implementing regions for public policy, and describe limitations of our work. In the Appendix, we give more information about the similarities between networks, the similarities between their associated regions, and the results of various community-detection methods. We also show maps of our regions. We provide an online tool to explore consensus regions at https://doi.org/10.6084/m9.figshare.14071439.

2 Data and methods

2.1 Data sets and data preparation

We construct regions from five data sets that encode different types of interactions between people in the 3108 counties (i.e., nodes) of the coterminous United States. From each of these data sets, we construct an associated weighted network. We also use an unweighted county-adjacency network \(G_{a}\) and associate COVID-19 case data with the edges of \(G_{a}\). We treat the five networks that we use to create the regions as independent variables, and we treat the COVID-19 data on the edges of \(G_{a}\) as an outcome variable. We only consider COVID-19 case data across counties that are geographically adjacent.

See Fig. 1 for a schematic illustration of our approach.

Figure 1
figure 1

Schematic illustration of our approach to obtain human-network regions through network partitioning. Each network-partitioning method has an input of (A) a network of movement flows or social-media connections between U.S. counties. We apply a community-detection algorithm to determine (B) a set of distinct regions. We use (C) a network \(G_{a}\) of county adjacencies and (D) distinguish edges between regions (\(E_{b}\), in yellow) from edges within regions (\(E_{w}\), in black). (E) We then weight all edges by COVID-19 case counts, mutual case rates, and case-rate differences. (F) We measure these values both between regions (in yellow) and within regions

2.1.1 Movement and social-network data

In each of the five human-behavior networks, a node represents a county and an edge signifies some type of mobility or social-media connection between two counties. In Table 1, we summarize basic statistical properties of these networks and the county-adjacency network. In each of the human-behavior networks, we weight the edges by the numberFootnote 2 of interactions between people in pairs of counties. The edge weights are sums of bidirectional flows (for movement networks) or connections (for social-media networks) between two counties. We allow self-edges, which we weight based on the number of interactions with origins and destinations in the same county. Our Twitter and Facebook networks do not include all counties, as some counties’ populations do not have associated accounts or activity on these networks.

Table 1 Basic statistics of our networks for the coterminous United States

We obtain commute data from the U.S. Census LODESFootnote 3 data set of residence–workplace characteristics for the year 2015 [76]. Each flow represents commutes from home to work at the census block level. We obtain migration data from American Community Survey (ACS) estimates of county-to-county migration flows for a 5-year period (2013–2017) [75]. The flow estimates approximate the annual numbers of movers between counties for the 5-year period of the data.

We obtain GPS trace data for January and February 2020 from SafeGraph [66]. The origins of the mobile-phone traces are census block groups,Footnote 4 and the destinations are points of interest (PoIs) at which travelers end a trip. We track the origin county (i.e., the county that contains the census block group) and the destination county of each trip. (We do not track intermediate counties.) Each trip is associated with a flow from one county to another (or is an internal trip within a county). We use data from 1 January 2020 through 29 February 2020 because they are recent months with business-as-usual (and pre-pandemic) movement landscapes.

To obtain social-media regions, we use data from Facebook and Twitter (which is now called \(\mathbb{X}\)). We use Facebook’s Social-Connectedness Index (SCI), which is the number of Facebook friendships between accounts in two counties divided by the product of the numbers of accounts in those counties [7]. The Twitter data consists of accounts with reciprocal mentions (i.e., ‘co-mentions’) between 1 January 2014 and 31 December 2015. We obtained reciprocal account pairs from geolocated tweets that we collected using the Twitter Streaming API [74]. Although co-mentions do not imply personal ties between Twitter users, reciprocal mentions between two accounts do indicate personal communications and possible interpersonal relationships [49].

In Table 6 in the Appendix, we indicate the correlations between the human-behavior networks.

2.1.2 Assigning COVID-19 cases using a county-adjacency network

We obtain COVID-19 case counts from The New York Times COVID-19 API [59]. We use data from the week ending 31 May 2020 through the week ending 1 May 2022. To determine the case rates per county, we obtain 2018 population data by county from the U.S. Centers for Disease Control and Prevention (CDC) [17].

To examine local SARS-CoV-2 transmission, we create a county-adjacency network \(G_{a}\). The nodes of \(G_{a}\) are the individual centroids of the 3108 counties in the coterminous United States. Each undirected edge of \(G_{a}\) connects geographically-adjacent counties (i.e., counties that share a physical boundary). There are 9120 edges in total. We represent COVID-19 cases between adjacent counties by calculating case counts (C), mutual case rates per 1000 individuals (CR), and case-rate differences (CD). We assign these values to each edge of a network as follows. The case count C of a pair of counties (i.e., nodes) is the sum of their numbers of cases. The mutual case rate CR of two counties is equal to the sum of the case counts of the counties multiplied by 1000 and divided by the sum of their resident populations. The case-rate difference CD between two counties is equal to the difference between the individual case rates of those counties. We put more credence into mutual case rates and case-rate differences than into case counts because (1) cases are population-dependent and (2) our case counts can overcount cases. Placing case-count data on edges counts COVID-19 cases multiple times when a node participates in multiple edges.

We use all 3108 counties in the coterminous U.S. as nodes when constructing regions. However, when we examine the COVID-19 statistics of these regions, we omit five counties (New York, Queens, Kings, Bronx, and Richmond) that correspond to the five boroughs of New York City, as these counties are not included in The New York Times COVID-19 data set. These nodes participate in only seven total edges, so we omit them in our statistical calculations.

2.2 Constructing regions

2.2.1 Regional delineation using community detection

We detect communities in each of the five human-behavior networks. A ‘community’ of a network is a dense set of nodes that is connected sparsely to other dense sets of nodes [58]. We obtain different numbers of regions and different community assignments of counties for different community-detection methods. We use community detection to obtain hard partitions, so we assign each county (i.e., each node) in a network to exactly one community.

We measure the quality of our network partitions by calculating the modularity [28, 64] of these partitions. The modularity of a partition of a network is \(Q = \sum _{\ell}(e_{\ell m} - {b_{\ell}}^{2})\). The quantity \(e_{\ell m}\) is the fraction of a network’s total edge weight that connects communities and m, and \(b_{\ell} = \sum_{m} e_{\ell m}\) is the fraction of the total edge weight that is in or attached to community . The maximum value of modularity quantifies the amount of compartmentalization of a network [28, 64]. One expects Q to be large for a network partition with few edges or small total edge weight between its communities. We examine five different community-detection algorithms. We use the Louvain locally greedy method for modularity maximization [11], an old greedy method for modularity maximization [21], InfoMap [65], and WalkTrap [63] in the software package igraph (version 1.3.5) in the R computing environment [22]. (In igraph, the methods have the names cluster_louvain, cluster_fast_greedy, cluster_infomap, and cluster_walktrap, respectively.) We also use the REDCAP algorithm, which partitions a network into communities using a spatial minimum spanning tree [37]. Our main results use communities from the Louvain method, as this method yielded the largest values of maximized modularity \(Q_{{\mathrm{max}}}\). We show these modularity values in Table 2. We summarize our community-detection results for all five approaches in Table 9 in the Appendix.

Table 2 Basic summary statistics of our constructed regions. We give the number \(n(r)\) of geographic regions, the maximized modularity \(Q_{{\mathrm{max}}}\), the total length d of the internal boundaries, the number \(E_{b}\) of edges between regions, the number \(E_{w}\) of edges within regions, and \(d/E_{b}\)

2.2.2 Geographic random regions

To supplement our comparison of the five human-network regions to states, we construct 1000 sets of geographic random regions. Each set has 44 polygons. The number 44 is the closest integer to 43.83, which is the mean number \(n(r)\) of regions of the human-network regions and the states. See Table 2 for all values of \(n(r)\). To construct these regions, we first select 44 county centroids (i.e., nodes of the county-adjacency matrix \(G_{a}\)) uniformly at random from the set of counties. We then generate a Voronoi diagram from these 44 county centroids; this diagram covers the coterminous U.S. with 44 Voronoi polygons. We assign county centroids to the same region if they are in the same Voronoi polygon. We repeat this process 1000 times (i.e., for 1000 sets of 44 randomly-generated centroids). This yields 1000 sets of geographic random regions; in each set, each node belongs to one of the 44 regions. We report mean values of our calculations across these 1000 networks.

2.3 Methods for statistical analysis

2.3.1 Statistics and permutation tests for COVID-19 cases

We report statistics for case counts C, case rates CR, and case-rate differences CD for the five human-network regions, the states, and the geographic random regions. We then perform permutation tests in which we shuffle the edge labels (i.e., whether they are within-region edges or between-region edges) uniformly at random. For each permutation and for the real data, we then sum the case values (either C, CR, or CD) over the within-region edges. We run the permutation test 1000 times and thereby produce a distribution of sums for within-region edges. We compare this distribution to the actual sum of case values for within-region edges. We perform a separate permutation test for each of the three types of case values and for each region type.

2.3.2 Granger-causality tests for case rates

We examine Granger causality to assess whether or not the time series of COVID-19 case rates of a county successfully infers the time series of COVID-19 case rates of adjacent counties. A Granger-causality test produces a p-value for the null hypothesis that the COVID-19 case rate of a county does not improve inference of the COVID-19 case rate of an adjacent county using lagged values of the case rates. Because many public tracking services of COVID-19 data employ 7-day moving averages (e.g., the Georgia Department of Public Heath [31] and The New York Times [59]) and the CDC reports case data and related data in weekly intervals [17], we use a lag of one week.

Disease transmission can occur in either direction (or in both directions) between adjacent counties, so we calculate Granger causality twice for each pair of counties by switching the dependent-variable and independent-variable roles of the two time series in a test.

We perform our analysis in Esri ArcGIS and the R statistical computing environment.

3 Results

3.1 Constructed regions

We use the Louvain method [11] of maximizing the modularity objective function to detect communities and create regions in our five human-behavior networks. Of these five networks, the commute network yields the most regions (with \(n(r) = 75\) regions), and the Twitter and migration networks yield the fewest (with 26 and 28 regions, respectively). See Table 2 for basic statistics of our networks, Fig. 2 for visualizations of state and human-region boundaries, and Fig. 1 for an illustration of our pipeline to examine case counts, case rates, and case-rate differences between and within regions. The commute network and GPS-trip network result in the largest values of maximized modularity \(Q_{{\mathrm{max}}}\). We also detect communities in the networks from the geographic random model. In the geographic random model, there are 1000 different sets of regions, with 44 distinct regions in each network. For this model, we report mean values of the numbers of edges between and within regions.

Figure 2
figure 2

State boundaries and five human-region boundaries in the coterminous United States. We algorithmically detect the human-region boundaries from human-behavior networks using the Louvain method [11] of modularity maximization. We show the numbers of regions in parentheses

We use the county-adjacency network \(G_{a}\) to track when pairs of adjacent counties are assigned to the same region and when they are assigned to different regions. We denote the total number of edges that cross between two regions by \(E_{b}\), and we denote the total number of edges that remain within a region by \(E_{w}\). (The sum of \(E_{b}\) and \(E_{w}\) is 9120.) Because the geometry (specifically, the area and shape) of the regions and the numbers \(n(r)\) of regions are different in each network, some sets of regions provide more opportunities for crossings. The number \(n(r)\) of regions correlates both with the length d of the internal boundaries and with the number \(E_{b}\) of between-region crossings. The Pearson product-moment correlation coefficients are \(f(E_{b},d) \approx 0.986\),  \(f(E_{b}, n(r)) \approx 0.999\), and \(f(d, n(r)) \approx 0.997\). The ratio \(d/E_{b}\) is the length (in kilometers) of the internal boundaries per between-region crossing. We calculate that \(d/E_{b}\) is roughly 30 kilometers (see Table 2).

3.2 COVID-19 cases between and within regions

We discuss mutual case rates (which we denote by \(\mathrm{CR}_{b}\) for between-region edges and by \(\mathrm{CR}_{w}\) for within-region edges) and case-rate differences (which we denote by \(\mathrm{CD}_{b}\) for between-region edges and by \(\mathrm{CD}_{w}\) for within-region edges) on edges. We report case rates as cases per 1000 individuals.

3.2.1 Region-type variation in case counts, case rates, and case-rate differences

We first measure the COVID-19 case counts between regions (\(\mathrm{C}_{b}\)) and within regions (\(\mathrm{C}_{w}\)). We expect to obtain larger case counts for region types (e.g., migration regions) with larger regions. The commute regions, Twitter regions, and migration regions have the largest differences between within-region case counts and between-region case counts (see Table 3), suggesting that these types of partitions effectively demarcate locations with large case counts. The commute regions have the largest within-region case counts, followed by the Twitter regions and then the migration regions. The case rates between regions (\(\mathrm{CR}_{b}\)) are lowest for the commute and trip regions (indicating a low penetration of cases per capita across the boundaries) and are highest for the Facebook and migration regions. The case rates within regions (\(\mathrm{CR}_{w}\)) are highest for commute and trip regions, and they are lowest for the Facebook regions.

Table 3 Mean values of COVID-19 case counts (\(\mathrm{C}_{b}\) and \(\mathrm{C}_{w}\)), case rates (\(\mathrm{CR}_{b}\) and \(\mathrm{CR}_{w}\)), and case-rate differences (\(\mathrm{CD}_{b}\) and \(\mathrm{CD}_{w}\)) between and within regions, along with the differences (ΔC, ΔCR, and ΔCD) in these values. The difference is positive when a between-region value is larger, and it is negative when a within-region value is larger. The rightmost column is an odds ratio. The case data spans the week ending 31 May 2020 through the week ending 1 May 2022. The values of the COVID-19 case data are means of the weekly values. It is desirable for case rates (respectively, case-rate differences) to be large (respectively, small) within regions and to be small (respectively, large) between regions

The case-rate differences within regions (\(\mathrm{CD}_{w}\)) are smallest for commutes and second smallest for states (see Table 3), indicating that counties in the same region for these two types of networks have similar case rates. For case-rate differences between regions (\(\mathrm{CD}_{b}\)), where larger values indicate more case-rate heterogeneity, we find that the migration regions and states (followed by the commute regions) are the most effective demarcators. The Facebook and trip regions are the least effective human-network partitions with respect to \(\mathrm{CD}_{b}\). The large differences in case rates across states seemingly suggest that states are more effective partitions than we posited initially. The geographic random model has the least pronounced differences in COVID-19 case counts, case rates, and case-rate differences between versus within regions, indicating that the regions in the geographic random model do not effectively demarcate different regions of COVID-19 cases.

3.2.2 Odds ratios for case counts

The COVID-19 case count on an edge (which we denote by \(\mathrm{C}_{b}\) for between-region edges and by \(\mathrm{C}_{w}\) for within-region edges) is sensitive to the number of potential case crossings between regions. To account for this, we calculate the odds ratio \(\frac{(\mathrm{C}_{b}/\mathrm{C}_{w})}{(E_{b}/E_{w})}\) to estimate the ratio of the case count between regions to the case count within regions. The odds ratio conveys the likelihood that cases cross regions. This ratio is largest for the Facebook regions, second largest for the geographic random model’s regions, and third largest for the states. By contrast, commute and trip regions have the smallest ratios (see Table 3). These results illustrate that human-movement regions are the most effective of the examined regions. Moreover, the regions that we create using migration data or even Twitter co-mentions are more successful than states at delineating areas with large COVID-19 case counts.

3.2.3 Statistical tests

We now test for statistical significance in COVID-19 case counts, mutual case rates, and case-rate differences. Our permutation tests indicate that almost all sets of regions have larger case counts within regions (\(\mathrm{C}_{w}\)) and smaller case-rate differences within regions (\(\mathrm{CD}_{w}\)) than one would expect if we had assigned the labels ‘within region’ and ‘between region’ to edges without considering geography (see Table 4). The values of \(\mathrm{C}_{w}\) are largest within commute regions, second largest within Twitter regions, and third largest within migration regions. The values of the within-region case rates \(\mathrm{CR}_{w}\) are most significantly different from the distribution from the permutation test for commute regions and then states, Twitter regions, and trip regions. For the regions in the other networks, we do not observe a significant deviation from distributions from the permutation tests. The geographic random regions have the largest within-region case-rate differences \(\mathrm{CD}_{w}\). This is unsurprising, as we created these regions randomly instead of from human-behavior data. Of the human-network regions, the Twitter and migration networks yield the smallest within-region case-rate differences. Therefore, for these regions, adjacent counties in the same region tend to have similar case rates.

Table 4 Results of permutation tests across region types for expected and actual COVID-19 case counts, case rates, and case-rate differences

Our results illustrate that states may be somewhat effective at delineating regions based on COVID-19 case rates. Our tests of statistical significance also illustrate that commute regions effectively delineate regions and that states and Twitter regions perform better than we expected.

We now describe the results of our two Granger-causality tests [71] for each pair of counties. In these tests, we consider only case rates, as we want to capture population-normalized waves of COVID-19. Whenever both tests are significant for a pair of adjacent counties, we conclude that there is evidence of Granger causality of potential disease transmission between them. Effective regions have few statistically significant Granger causalities for between-region (\(\mathrm{CR}_{b}\)) pairs and many statistically significant Granger causalities for within-region (\(\mathrm{CR}_{w}\)) pairs.

In Table 5, we show the percentages of county pairs with a Granger-causality p-value of at least 0.001 for both within-region pairs and between-region pairs. At the 0.001 significance level, 30–50% of the between-region pairs are significantly coordinated temporally (i.e., they are Granger causal in at least one direction) and about 45% of the within-region pairs are significantly coordinated temporally (see Table 5). All types of regions have a similar number of pairs of counties that are coordinated temporally.

Table 5 Results of our Granger-causality and Kolmogorov–Smirnov (KS) tests

We use a Kolmogorov–Smirnov (KS) test [55] to produce a D-statistic, which we use to evaluate whether or not differences are significant. We find that pairs of counties in the commute regions and Twitter regions are significantly coordinated temporally.

3.3 Consensus regions

To develop policy, it is useful to have a single set of regions to enable the implementation of stay-at-home orders and other mobility-related NPIs that are consistent with the severity of local outbreaks. Our method to obtain consensus regions (see Sect. 2) results in 31 regions and a maximized modularity of \(Q_{\mathrm{max}} \approx 0.92\) (see Fig. 3). In the depicted consensus regions, the state boundaries are often preserved; this is convenient administratively.

Figure 3
figure 3

We construct consensus regions in the U.S. using an unweighted combination of the states and the regions that we obtain from four human-behavior networks. We do not include the Facebook regions in the consensus regions because they are not effective at demarcating COVID-19 cases. These consensus regions indicate areas of strong within-region connectivity and weak between-region connectivity. (We computed the depicted regions using Louvain modularity maximization in the software package Gephi (version 0.10.0) [9])

To allow policy makers to explore multiple scenarios for their communities, we have developed an online toolFootnote 5 that creates on-the-fly regions for state, commute, migration, and trip networks (because these networks produce the most effective COVID-19 regions in our study). Users can change the relative weightings of these input networks to customize regions. They can also download images of the resultant regions and export data (which indicates the region assignments of all counties).

4 Conclusions and discussion

We used human-mobility networks and social-media networks to construct functional geographic regions, which capture natural movements and social interactions. We then evaluated how effectively state boundaries and these regions capture natural boundaries in the geographic spread of COVID-19 infections. We found that states, which were the predominant regions for administering policies for COVID-19 mitigation, yield less effective boundaries than the regions that we constructed from a commute network. We also found that states are more effective than the regions that we constructed from social-media networks and more effective than a random model of geographically-contiguous regions.

It is reasonable that the regions from the commute network are effective. Human-mobility regions are anchored by metropolitan areas. This yields strong connections in urban centers and suburbs, with weaker connections in exurban areas. Consequently, mobility-based functional regions tend to have many COVID-19 infections within regions and relatively few cases between regions. This conclusion reflects well-known regional-science principles that commuters and movers tend to follow an urban hierarchy with anchor cities and peripheries [34, 39, 45]. A regional approach is helpful for examining the spread of diseases (such as COVID-19) that have scant geographic transmission statistics. Based on our findings, we suggest that it is important to explore consensus regions that are derived from human-behavior networks as ad hoc administrative areas for making policy decisions for COVID-19 and other infectious diseases.

Applying policies and messaging to county-based regions instead of states poses an administrative burden that requires coordination and cooperative legislating. Nevertheless, during the COVID-19 pandemic, U.S. governors created multi-state regions [53, 70] and local authorities in the United Kingdom enacted specialized policies at local levels, rather than at the national level [32]. There are also county-level coalitions in economic development (e.g., the longstanding 420-county Appalachian Regional Commission [4]), and the U.S. federal government issues severe weather warnings (e.g., tornado, fire, storm, hurricane, and wind advisories) at the county level. Local-level operations have also yielded improvements in a variety of health systems. For instance, several years ago, the U.S. Organ Procurement and Transplantation Network implemented county-level liver-transplant regions that are based on supply-and-demand optimization as an improvement over state-level regions [30]. Functional regions may also be useful for examining the practicality of proposed inter-county alliances. In our work, for example, we did not find any regions that resemble the proposed region of Greater Idaho [19]. Instead, our regions illustrate that counties in Oregon have few existing connections to counties in Idaho.

When implementing regions in health-related situations, it is important to consider local variations. Administrative and household-level responses to COVID-19 varied across U.S. states. For example, testing rates for SARS-CoV-2 infections were different in different states [72]. There were also stark differences in vaccination rates across areas for both political and accessibility reasons [78]. Notably, vaccine-uptake rates were lower for socially vulnerable populations (as defined by low socio-economic status, household composition, a lack of access to healthcare, and disability status) [10]. Mobility behavior during lockdowns also depends on factors such as socioeconomic status [52].

Our work has a variety of limitations, and it is important to highlight several of them. A key shortcoming is that our human-behavior data are not up to date. Our data were collected prior to the COVID-19 pandemic. Our mobility and social-media data predate the pandemic, so they may be misaligned with actual movement and information exchanges between counties. For instance, the migration data are from the period 2013–2017, the Twitter data are from 2014–2015, and the mobility data are from January and February 2020. Another shortcoming is that conducting our research at the county level entails a mismatch in granularity across the United States. Some counties have millions of residents and encompass large geographic areas, and other counties have few residents. Additionally, because it is difficult to detect the spatial transmission of SARS-CoV-2, we used rates in adjacent counties as a proxy for geographic transmission. However, we lack evidence of actual contagion events across these areas. Inevitably, one can also emphasize methodological limitations, such as in the choices of community-detection methods and other computations. For example, we made subjective choices of descriptive and inferential statistics, and one can certainly calculate other statistics to attempt to capture variations within and across regional boundaries.

In future work, we hope to account for heterogeneities in COVID-19 responses and NPI administration. We also plan to incorporate the temporal dynamics of spreading processes that arise from local and seasonal events—such as spring breaks from school, holidays, and large festivals [23, 56]—that we did not capture in our analysis. Events such as the lifting of lockdown policies are also important. Directly after a lockdown, increased human movement often is not associated with an increased spread of infections [2]. Indeed, functional geographic regions that one derives using data during lockdown periods have smaller areas than regions that one derives from data that one captures after lockdowns are lifted [67]. Extensions of our analysis can incorporate localized spikes in movements and differences across time to capture seasonal changes in regions. As suggested in [32], using data with finely-grained time resolution (such as real-time data) may help capture the flexibility and elasticity of boundaries.

It is also important to consider the spatial resolution of social-media data and other ‘non-traditional’ sources of disease-spread data [50]. We performed our analysis at the county level, but a similar analysis at other scales (such as the neighborhood scale) likely would yield different results. Constructing functional geographic regions on different scales may reveal how regions change, agglomerate, shrink, and expand with time.

Availability of data and materials

We have posted the input data for the commute and migration networks and the data of each county’s community assignments for each combination of human-behavior network and community-detection method. Our R and Python code is also available online. (We cannot share the data from SafeGraph, Twitter, or Facebook, as they are proprietary.) These materials are collectively called ‘Replication Data and Code for: Human-Network Regions as Effective Geographic Units for Disease Mitigation’ and are available on Figshare at https://doi.org/10.6084/m9.figshare.14071439.

Notes

  1. Commutes and GPS traces directly indicate movement, whereas social-media networks encode proclivities to spread information. However, because social-media relationships are often correlated with networks of movement [73], data from them may still provide a heuristic indication of appropriate boundaries.

  2. For all data except for Facebook friendships, the edge weights are positive integers. For the Facebook friendship network, the edge weight between two counties is the Social-Connectedness Index (SCI) between those counties. The SCI is the number of Facebook friendships between the accounts in two counties divided by the product of the numbers of accounts in those counties [7].

  3. The acronym LODES stands for LEHD Origin–Destination Employment Statistics, and the acronym LEHD stands for Longitudinal Employer–Household Dynamics.

  4. A census block group is an areal unit that the U.S. Census uses for demographic data.

  5. It is available at https://doi.org/10.6084/m9.figshare.14071439.

Abbreviations

ACS:

American Community Survey

API:

application programming interface

CDC:

Centers for Disease Control and Prevention

COVID-19:

coronavirus disease 2019

GLEaM:

GLobal Epidemic and Mobility

GPS:

Global Positioning System

KS:

Kolmogorov–Smirnov

LEHD:

Longitudinal Employer–Household Dynamics

LODES:

LEHD Origin–Destination Employment Statistics

NPI:

non-pharmaceutical intervention

REDCAP:

regionalization with dynamically constrained agglomerative clustering and partitioning

SARS-CoV-2:

severe acute respiratory syndrome coronavirus 2

SCI:

Social-Connectedness Index

References

  1. adams j, Bayham J, Santos T, Ghosh D, Samet J (2020) Comparing the boundaries between mobility-identified communities and potential administrative definitions for COVID-19 “protect our neighbors” criteria. Coauthored with the Colorado COVID-19 Modeling Group. Available at https://coloradosph.cuanschutz.edu/docs/librariesprovider151/default-document-library/mobility_admin_boundary_comparison.pdf?sfvrsn=de9cc7b9_0 (accessed 11 February 2023)

  2. Alessandretti L (2022) What human mobility data tell us about COVID-19 spread. Nat Rev Phys 4(1):12–13

    Article  Google Scholar 

  3. Althouse BM, Wallace B, Case B, Scarpino SV, Allard A, Berdahl AM, White ER, Hébert-Dufresne L (2020) The unintended consequences of inconsistent pandemic control policies. MedRxiv. Available at https://doi.org/10.1101/2020.08.21.20179473

  4. Appalachian Regional Commission (no date) About the Appalachian Regional Commission. Available at https://www.arc.gov/about-the-appalachian-regional-commission/ (accessed 12 July 2021)

  5. Arino J (2022) Describing, modelling and forecasting the spatial and temporal spread of COVID-19: A short review. In: Murty VK, Wu J (eds) Mathematics of Public Health: Proceedings of the Seminar on the Mathematical Modelling of COVID-19. Springer, Cham, pp 25–51

    Chapter  Google Scholar 

  6. Baghersad M, Emadikhiav M, Huang CD, Behara RS (2023) Modularity maximization to design contiguous policy zones for pandemic response. Eur J Oper Res 304(1):99–112

    Article  MathSciNet  Google Scholar 

  7. Bailey M, Cao R, Kuchler T, Stroebel J, Wong A (2018) Social connectedness: Measurement, determinants, and effects. J Econ Perspect 32(3):259–280

    Article  Google Scholar 

  8. Balcan D, Gonçalves B, Hu H, Ramasco JJ, Colizza V, Vespignani A (2010) Modeling the spatial spread of infectious diseases: The GLobal Epidemic and Mobility computational model. J Comput Sci 1(3):132–145

    Article  Google Scholar 

  9. Bastian M, Heymann S, Jacomy M (2009) Gephi: An open source software for exploring and manipulating networks. In: Proceedings of the International AAAI Conference on Weblogs and Social Media, vol 3, no 1. Association for the Advancement of Artificial Intelligence, Washington, DC, pp 361–362

    Google Scholar 

  10. Bilal U, Mullachery PH, Schnake-Mahl A, Rollins H, McCulley E, Kolker J, Barber S, Diez Roux AV (2022) Heterogeneity in spatial inequities in COVID-19 vaccination across 16 large US cities. Am J Epidemiol 191(9):1546–1556

    Article  Google Scholar 

  11. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008

    Article  Google Scholar 

  12. Brelsford C, Thakur G, Arthur R, Williams H (2019) Using digital trace data to identify regions and cities. In: ARIC ’19: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances on Resilient and Intelligent Cities, Assoc. Comput. Mach., New York, pp 5–8

    Google Scholar 

  13. Brooker-Gross SR (1983) News and metropolitan hinterland and hierarchy. Urban Geogr 4(2):138–155

    Google Scholar 

  14. Brown LA, Holmes J (1971) The delimitation of functional regions, nodal regions, and hierarchies by functional distance approaches. Ekistics 32(192):387–391

    Google Scholar 

  15. Buchel O, Ninkov A, Cathel D, Bar-Yam Y, Hedayatifar L (2021) Strategizing COVID-19 lockdowns using mobility patterns. R Soc Open Sci 8(12):210865

    Article  Google Scholar 

  16. Capano G, Howlett M, Jarvis DSL, Ramesh M, Goyal N (2020) Mobilizing policy (in)capacity to fight COVID-19: Understanding variations in state responses. Policy Soc 39(3):285–308

    Article  Google Scholar 

  17. Centers for Disease Control and Prevention (CDC) (no date) COVID Data Tracker. Available at https://covid.cdc.gov/covid-data-tracker/ (accessed 11 February 2023)

  18. Chang S, Vrabac D, Leskovec J, Ugander J (2023) Estimating geographic spillover effects of COVID-19 policies from large-scale mobility networks. In: AAAI ’23/IAAI ’23/EAAI ’23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence. Association for the Advancement of Artificial Intelligence, Washington, DC, pp 14161–14169

  19. Chapell B (2021) Oregone? 7 Oregon counties vote to back seceding, so citizens can vote GOP in Idaho. National Public Radio. Available at https://www.npr.org/2021/05/20/998660102/oregone-7-oregon-counties-vote-to-back-seceding-so-citizens-can-vote-gop-in-idah

  20. Chiu WA, Fischer R, Ndeffo-Mbah ML (2020) State-level needs for social distancing and contact tracing to contain COVID-19 in the United States. Nat Hum Behav 4(10):1080–1090

    Article  Google Scholar 

  21. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111

    Article  Google Scholar 

  22. Csárdi G, Nepusz T, Traag V, Horvát S, Zanini F, Noom D, Müller K (2023) igraph: Network Analysis and Visualization in R. R package version 1.5.1.9000

    Google Scholar 

  23. Dave D, McNichols D, Sabia JJ (2021) The contagion externality of a superspreading event: The Sturgis Motorcycle Rally and COVID-19. South Econ J 87(3):769–807

    Article  Google Scholar 

  24. Davis JT, Chinazzi M, Perra N, Mu K, Pastore y Piontti A, Ajelli M, Dean NE, Gioannini C, Litvinova M, Merler S et al. (2021) Cryptic transmission of SARS-CoV-2 and the first COVID-19 wave. Nature 600(7887):127–132

    Article  Google Scholar 

  25. Ducruet C, Beauguitte L (2014) Spatial science and network science: Review and outcomes of a complex relationship. Netw Spat Econ 14(3–4):297–316

    Article  MathSciNet  Google Scholar 

  26. Farmer CJ, Fotheringham AS (2011) Network-based functional regions. Environ Plan A 43(11):2723–2741

    Article  Google Scholar 

  27. Ferretti L, Wymant C, Kendall M, Zhao L, Nurtay A, Abeler-Dörner L, Parker M, Bonsall D, Fraser C (2020) Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 368(6491):eabb6936

    Article  Google Scholar 

  28. Fortunato S, Hric D (2016) Community detection in networks: A user guide. Phys Rep 659:1–44

    Article  MathSciNet  Google Scholar 

  29. Foster S (2020) As COVID-19 proliferates mayors take response lead, sometimes in conflict with their governors. Georgetown Law SALPAL. Available at https://www.law.georgetown.edu/salpal/as-covid-19-proliferates-mayors-take-response-lead-sometimes-in-conflicts-with-their-governors/

  30. Gentry S, Chow E, Massie A, Segev D (2015) Gerrymandering for justice: Redistricting US liver allocation. Interfaces 45(5):462–480

    Article  Google Scholar 

  31. Georgia Deparment of Public Health (no date) COVID-19 Status Report. Available at https://dph.georgia.gov/covid-19-status-report (accessed 20 July 2023)

  32. Gibbs H, Nightingale E, Liu Y, Cheshire J, Danon L, Smeeth L, Pearson CA, Grundy C (2021) Detecting behavioural changes in human movement to inform the spatial scale of interventions against COVID-19. PLoS Comput Biol 17(7):e1009162

    Article  Google Scholar 

  33. Green HL (1955) Hinterland boundaries of New York City and Boston in Southern New England. Econ Geogr 31(4):283–300

    Article  MathSciNet  Google Scholar 

  34. Greenwood MJ (1985) Human migration: Theory, models, and empirical studies. J Reg Sci 25(4):521–544

    Article  Google Scholar 

  35. Guimerà R, Mossa S, Turtschi A, Amaral LAN (2005) The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proc Natl Acad Sci USA 102(22):7794–7799

    Article  MathSciNet  Google Scholar 

  36. Guldmann J-M (2004) Spatial interaction models of international telecommunication flows. In: Goodchild MF, Janelle DG (eds) Best Practices in Spatially Integrated Social Science. Oxford University Press, Oxford, pp 400–442

    Chapter  Google Scholar 

  37. Guo D (2008) Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP). Int J Geogr Inf Sci 22(7):801–823

    Article  Google Scholar 

  38. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher RA (2018) Nextstrain: Real-time tracking of pathogen evolution. Bioinformatics 34(23):4121–4123

    Article  Google Scholar 

  39. Haggett P, Chorley RJ (1969) Network Analysis in Geography. Edward Arnold Publishers Ltd., London

    Google Scholar 

  40. Haselsberger B (2014) Decoding borders. Appreciating border impacts on space and people. Plan Theory Pract 15(4):505–526

    Article  Google Scholar 

  41. Hazarie S, Soriano-Panos D, Arenas A, Gómez-Gardeñes J, Ghoshal G (2021) Interplay between intra-urban population density and mobility in determining the spread of epidemics. Commun Phys 4:191

    Article  Google Scholar 

  42. Holtz D, Zhao M, Benzell SG, Cao CY, Rahimian MA, Yang J, Allen J, Collis A, Moehring A, Sowrirajan T et al. (2020) Interdependence and the cost of uncoordinated responses to COVID-19. Proc Natl Acad Sci USA 117(33):19837–19843

    Article  Google Scholar 

  43. Hou X, Gao S, Li Q, Kang Y, Chen N, Chen K, Rao J, Ellenberg JS, Patz JA (2021) Intracounty modeling of COVID-19 infection with human mobility: Assessing spatial heterogeneity with business traffic, age, and race. Proc Natl Acad Sci USA 118(24):e2020524118

    Article  Google Scholar 

  44. Iacus SM, Santamaria C, Sermi F, Spyratos S, Tarchi D, Vespe M (2022) Mobility functional areas and COVID-19 spread. Transportation 49:1999–2025

    Article  Google Scholar 

  45. Isard W (1956) Regional science, the concept of region, and regional structure. Pap Reg Sci 2(1):13–26

    Article  Google Scholar 

  46. Jin M, Gong L, Cao Y, Zhang P, Gong Y, Liu Y (2021) Identifying borders of activity spaces and quantifying border effects on intra-urban travel through spatial interaction network. Comput Environ Urban Syst 87:101625

    Article  Google Scholar 

  47. Kashyap R (2021) Has demography witnessed a data revolution? Promises and pitfalls of a changing data ecosystem. Pop Stud-J Demog 75(sup1):47–75

    Article  Google Scholar 

  48. Kohn CF (1970) Regions and regionalizing. J Geogr 69(3):134–140

    Google Scholar 

  49. Koylu C (2018) Discovering multi-scale community structures from the interpersonal communication network on Twitter. In: Perez L, Kim E-K, Sengupta R (eds) Agent-Based Models and Complexity Science in the Age of Geospatial Big Data. Springer, Cham, pp 87–102

    Chapter  Google Scholar 

  50. Lee EC, Arab A, Colizza V, Bansal S (2022) Spatial aggregation choice in the era of digital and administrative surveillance data. PLOS Digit Health 1(6):e0000039

    Article  Google Scholar 

  51. Liu X, Hollister R, Andris C (2018) Wealthy hubs and poor chains: Constellations in the U.S. urban migration system. In: Agent-Based Models and Complexity Science in the Age of Geospatial Big Data. Springer, Cham, pp 73–86

    Chapter  Google Scholar 

  52. Lucchini L, Langle-Chimal O, Candeago L, Melito L, Chunet A, Aleister Montfort BL, Lozano-Gracia N, Fraiberger SP (2023) Socioeconomic disparities in mobility behavior during the COVID-19 pandemic in developing countries. arXiv:2305.06888

  53. Luna T (2020) California, Oregon and Washington to work together on plan to lift coronavirus restrictions. Available at https://www.latimes.com/california/story/2020-04-13/coronavirus-restrictions-gavin-newsom-california-washington-oregon-western-state-pact

  54. Masser I, Scheurwater J (1980) Functional regionalisation of spatial interaction data: An evaluation of some suggested strategies. Environ Plan A 12(12):1357–1382

    Article  Google Scholar 

  55. Massey Jr FJ (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78

    Article  Google Scholar 

  56. Mehta SH, Clipman SJ, Wesolowski A, Solomon SS (2021) Holiday gatherings, mobility and SARS-CoV-2 transmission: Results from 10 US states following Thanksgiving. Sci Rep 11:17328

    Article  Google Scholar 

  57. Miller AC, Foti NJ, Lewnard JA, Jewell NP, Guestrin C, Fox EB (2020) Mobility trends provide a leading indicator of changes in SARS-CoV-2 transmission. medRxiv. Available at https://doi.org/10.1101/2020.05.07.20094441

  58. Newman MEJ (2018) Networks, 2nd edn. Oxford University Press, Oxford

    Book  Google Scholar 

  59. The New York Times (2023) Coronavirus in the U.S.: Latest Map and Case Count. COVID Data Tracker. Available at https://www.nytimes.com/interactive/2021/us/covid-cases.html (accessed 12 November 2023)

  60. Noronha VT, Goodchild MF (1992) Modeling interregional interaction: Implications for defining functional regions. Ann Assoc Am Geogr 82(1):86–102

    Article  Google Scholar 

  61. Our World in Data (no date) COVID-19 Data Explorer. Available at https://ourworldindata.org/explorers/coronavirus-data-explorer (accessed 11 February 2023)

  62. Philbrick AK (1957) Principles of areal functional organization in regional human geography. Econ Geogr 33(4):299–336

    Article  Google Scholar 

  63. Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Yolum P, Güngör T, Gürgen F, Özturan C (eds) ISCIS 2005: International Symposium on Computer and Information Sciences. Springer, Heidelberg, pp 284–293

    Chapter  Google Scholar 

  64. Porter MA, Onnela J-P, Mucha PJ (2009) Communities in networks. Not Am Math Soc 56(9):1082–1097, 1164–1166

    MathSciNet  Google Scholar 

  65. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123

    Article  Google Scholar 

  66. SafeGraph (2021) COVID-19 Data Consortium. Available at https://web.archive.org/web/20210421220026/https://www.safegraph.com/academics (accessed 5 May 2021)

  67. Schindler DJ, Clarke J, Barahona M (2023) Multiscale mobility patterns and the restriction of human movement. R Soc Open Sci 10(10):230405

    Article  Google Scholar 

  68. Schlosser F, Maier BF, Jack O, Hinrichs D, Zachariae A, Brockmann D (2020) COVID-19 lockdown induces disease-mitigating structural changes in mobility networks. Proc Natl Acad Sci USA 117(52):32883–32890

    Article  Google Scholar 

  69. Seto CH, Graif C, Khademi A, Honavar VG, Kelling CE (2022) Connected in health: Place-to-place commuting networks and COVID-19 spillovers. Health Place 77:102891

    Article  Google Scholar 

  70. Sgueglia K, Kelly C (2020) 7 Midwestern governors announce their states will coordinate on reopening. Available at https://www.cnn.com/2020/04/16/politics/midwest-governors-reopening-pact/index.html

  71. Shojaie A, Fox EB (2022) Granger causality: A review and recent advances. Annu Rev Stat Appl 9(1):289–319

    Article  MathSciNet  Google Scholar 

  72. Souch JM, Cossman JS (2021) A commentary on rural–urban disparities in COVID-19 testing rates per 100,000 and risk factors. J Rural Health 37(1):188

    Article  Google Scholar 

  73. Takhteyev Y, Gruzd A, Wellman B (2012) Geography of Twitter networks. Soc Netw 34(1):73–81

    Article  Google Scholar 

  74. Twitter, Inc. (2021) Twitter Streaming API. Available at https://developer.twitter.com/en/products/twitter-api (accessed 15 December 2020)

  75. U.S. Census Bureau (no date) 2013–2017 American Community Survey Migration/Geographic Mobility Data. Available at https://www.census.gov/topics/population/migration/data/tables/acs.2017.html (accessed 15 April 2021)

  76. U.S. Census Bureau (no date) Longitudinal Employer–Household Dynamics LEHD–LODES Residence–Workplace Characteristics. Available at https://lehd.ces.census.gov/data (accessed 15 December 2020)

  77. Xiong C, Hu S, Yang M, Luo W, Zhang L (2020) Mobile device data reveal the dynamics in a positive relationship between human mobility and COVID-19 infections. Proc Natl Acad Sci USA 117(44):27087–27089

    Article  Google Scholar 

  78. Yuan Y, Jahani E, Zhao S, Ahn Y-Y, Pentland AS (2023) Implications of COVID-19 vaccination heterogeneity in mobility networks. Commun Phys 6:206

    Article  Google Scholar 

Download references

Acknowledgements

We thank Geng Tian for developing our Web application, and we thank SafeGraph and Facebook for providing data. We thank Grant D. Brown and William Drummond for helpful comments. CA thanks Shaowen Wang and the Geospatial Fellows Program at the University of Illinois Urbana-Champaign.

Funding

MAP acknowledges support from the National Science Foundation (grant number DMS-2027438) through the RAPID program. CA acknowledges support from the National Science Foundation (grant number SBE-2045271).

Author information

Authors and Affiliations

Authors

Contributions

CA, CK, and MAP designed the research; CA, CK, and MAP performed the research; CA and CK contributed new analytical tools; CA and CK analyzed data; and CA, CK, and MAP wrote the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Clio Andris.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A.1 Measuring similarities between the input networks and between the resultant regions

1.1.1 A.1.1 Similarities between the human-behavior networks

We assess the similarities between the five human-behavior networks by calculating the Pearson correlation coefficients between these networks (see Table 6). To calculate these coefficients, we represent each network as a sequence of edge weights (including the 0 weights), where each element of the sequence corresponds to a distinct edge (i.e., a unique pair of counties or a self-edge) of an undirected network of counties. We then calculate the correlation coefficient for each pair of edge-weight sequences. We find that the Facebook network has little overlap with the other four human-behavior networks, so its edge weights differ from typical flows of inter-county commutes, trips, and residential home-location movements (i.e., migration). However, the Twitter network is strongly correlated with our human-mobility networks. Therefore, some social-media data does appear to correlate with movement patterns.

Table 6 Pearson correlation coefficients between the human-behavior networks

1.1.2 A.1.2 Similarities between regions

We assess the similarities between the different assignments of counties to regions by calculating Jaccard indices and z-scores of Rand coefficients. As one can see in Tables 7 and 8, there are substantial but imperfect similarities.

Table 7 Similarities between the sets of regions using the Jaccard similarity index
Table 8 Similarities between the sets of regions using the z-Score \(z_{R}\) of the Rand coefficient

1.2 A.2 More information about our community-detection results

In Table 9, we show the values of the maximized modularity \(Q_{{\mathrm{max}}}\) and the numbers of resultant regions for each input network and each community-detection algorithm.

Table 9 Summary of the results of the community-detection algorithms

1.3 A.3 More information about resultant regions and consensus regions

1.3.1 A.3.1 Resultant regions for each input network

After performing community detection, we construct geographic regions by assigning each county to a single community. In Figs. 48, we show maps of the regions that we obtain using community detection on our five human-behavior networks: commutes, Facebook friendships, migration, trips, and Twitter co-mentions. Many of these networks tend to be correlated with state boundaries; this is not by design, but instead occurs naturally in the data. When the human-network regions do not match the regions from state boundaries, natural features such as mountain ranges (e.g., the Appalachian range in the trip regions, as one can see in Fig. 7) and infrastructure such as highways (e.g., connections in Southern New Mexico and West Texas in the commute regions, as one can see in Fig. 4) can join regions across states or divide regions within states. Time zones may also play a role, as we see for the Facebook regions in a division between the Central and Eastern time zones (see Fig. 5).

Figure 4
figure 4

Regions that we construct using a network of commutes from U.S. Census LODES data [76] that we aggregate from census blocks to counties. This yields 75 regions, which is the most regions of any of the examined networks

Figure 5
figure 5

Regions that we construct from Facebook friendships using the Facebook Social Connectivity Index (SCI) [7], which is the number of Facebook friendships between two counties divided by the total number of Facebook accounts in those two counties. There are a total of 33 regions, including one very large region that includes the entire U.S. West Coast

Figure 6
figure 6

Regions that we construct using a migration network from the American Community Survey (ACS) [75]. There are a total of 28 regions. In the migration regions, outliers often belong to a distant region. (For example, because of the nature of inter-metropolitan migration, a county in Michigan can be part of a region that is based in Florida)

Figure 7
figure 7

Regions that we construct using GPS traces (i.e., trips) from SafeGraph [66] in January–February 2020. We aggregate the data from the census-block-group level to the county level. There are a total of 52 regions

Figure 8
figure 8

Regions that we construct from a network of Twitter co-mentions [74]. There are a total of 26 regions. Some counties are in regions that do not match those of their surrounding communities. This likely occurs because of small populations and accordingly few Twitter accounts

For the Facebook, migration, and Twitter regions that we show in Fig. 2, we reassign outlier counties, which look like ‘holes’ in a map and consist of clusters of up to three counties. These counties occur mostly in the Great Plains and likely are not part of the surrounding communities because of their few connections in these networks. The small numbers of connections result from their small populations or small numbers of social-media accounts. We reassign each of these counties to the neighboring region with the most geographically-adjacent neighbors, with a preference for neighbors in the same state. (There are no instances in which a county shares a border with an equal number of counties from two or more different regions.)

1.3.2 A.3.2 Creating consensus regions

We combine the states and four human-network regions to create a single set of consensus regions. In this calculation, we do not use the Facebook regions because they are not effective at demarcating COVID-19 cases. We weight each pair of counties in the county-adjacency network \(G_{a}\) by the number of times that both counties appear in the same community (or state, for the state network), so edge weights range between 0 (i.e., never in the same community or state) and 5 (i.e., always in the same community or state). We create consensus regions by applying the Louvain modularity-maximization algorithm to this agreement network using the software package Gephi (version 0.10.0) [9]. By visualizing these edge weights on a map (see Fig. 9, which also incorporates the Facebook regions, yielding a maximum agreement count of 6), we see the locations of strong and weak agreement.

Figure 9
figure 9

The agreement count between adjacent counties (i.e., counties that share a boundary). We weight the edges of this network by the number of times that we assign their two attached nodes to the same region or state, with six possible opportunities to agree. This visualization illustrates natural divisions between U.S. states and between regions

In our consensus regions, we use the concept of border ‘thickness’ [40]. Thick borders indicate that there are relatively few crossings in geographic space [46]. Therefore, the borders are thickest for edge weights of 0 (i.e., two neighboring counties are never in the same region) and they are thinnest for edge weights of 5.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Andris, C., Koylu, C. & Porter, M.A. Human-network regions as effective geographic units for disease mitigation. EPJ Data Sci. 12, 60 (2023). https://doi.org/10.1140/epjds/s13688-023-00426-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjds/s13688-023-00426-1

Keywords