Our main scientific goal is to study socioeconomic segregation and biases in population mixing in cities by observing correlation patterns between the SES of people and places they visit. Using the collected data, this objective can be addressed by building a network of individuals visiting places. We define a stratified bipartite network \(G=(U,P,E)\), where individual u is a node in set U, and place p belongs to set P (with \(U\cap P=\emptyset \)). People and places are connected by edges \(e_{u,p}\in E\) with weights \(w_{u,p}\) coding the number of times person u visited place p (see Fig. 1). Further, we stratify U into a set of socioeconomic classes indexed by values from \(C_{U}\) thus assigning a class membership \(c_{u}=i\in C_{U}\) to each individual.
In the same way we define \(c_{p}=j\in C_{P}\) classes for places. This network representation captures all information about the socioeconomically stratified visiting patterns of people to venues, coding their possible encounters and giving an aggregated description of the potential mixing patterns of people of different socioeconomic classes.
3.1 Matrix measures
Based on the bipartite network representation we can measure the frequency at which people of a given class visit places in different classes. To summarise these visiting patterns we use stratification matrices [23]. An empirical stratification matrix gives the probability that a person \(u\in U\) from a given socioeconomic class \(c_{u}=i\in C_{U}\) visits a place \(p\in P\) belonging to a class \(c_{p}=j\in C_{P}\). More formally:
$$ M_{i,j}= \frac{\sum_{U,c_{u}=i}\sum_{P,c_{p}=j} w_{u,p}}{\sum_{j\in C_{P}}\sum_{U,c_{u}=i}\sum_{P,c_{p}=j} w_{u,p}}, $$
(1)
where the numerator counts the number of times people from class i visit places of class j, and the denominator normalises this frequency matrix column-wise to obtain a visiting probability distribution for each individual class \(i\in C_{U}\). Such matrices are shown in Fig. 2 for selected cities (Houston, New York and San Diego). The dominant diagonal elements for Houston and San Diego indicate strongly stratified visiting patterns in these cities. People prefer to visit places of their own or similar socioeconomic class, rather than places from remote classes. Interestingly, for New York this pattern is less evident suggesting weaker socioeconomic preferences in visiting venues.
To decide if these patterns appear as the consequence of population statistics or other confounding effects, we compare the matrix \(M_{i,j}\) to a reference matrix, which measures similar stratification patterns in a system where visiting patterns appear uniformly at random with certain constraints. This randomised stratification matrix is defined through a random rewiring process of the bipartite network, while constraining the total number and frequency of visits of each individual (i.e. their activity and link weights), the class of individuals and places, but fully randomising links between individuals and visited places otherwise. The randomisation is performed by selecting randomly for each link of an individual u a place to visit from the set of places ever visited by their respected socioeconomic classes \(c_{u}\), while keeping the link weight intact. This in-class randomisation allows us to compare an individual’s behaviour to similar others, meanwhile distinguishing between socioeconomic classes, which potentially are characterised by very different visiting patterns.
After generating randomised bipartite networks via 100 independent realisations, we compute a similar column-wise normalised stratification matrix \(R_{i,j}\), representing the probability of people from class \(c_{i} \) randomly visiting to places of class \(c_{p} \). To finally obtain whether the empirical mobility patterns appear more frequently than by chance, we compare the empirical and the preference-based null model matrices. We obtain a normalised stratification matrix \(N_{i,j}\) by taking the element-by-element fraction of the empirical and random contact matrices as:
$$ N_{i,j}=\frac{M_{i,j}}{R_{i,j}}. $$
(2)
Each element in the matrix \(N_{i,j}\), which are \(N_{i,j}> 1\) (red bins in Fig. 2(d)–(f)) indicates that the visits made by individuals from class \(i\in C_{U}\) to place of class \(j\in C_{P}\) appeared with higher probability in the empirical observations than it was expected from the random null model.
Otherwise, the blue blocks for \(N_{i,j}< 1\) show that the corresponding visits appeared with a smaller probability than expected by chance. In cases red bins dominate the diagonal of the normalised matrix \(N_{i,j}\), it indicates patterns of socioeconomic stratification, where people prefer to visit places of similar socioeconomic status as their own, rather than places, which are richer or poorer than them. This is the case of Houston and San Diego (see Fig. 2(d) and (f) respectively) and many other cities listed in Additional file 1, Section B. However, this character is less evident for New York (see Fig. 2(e)), where despite known strong residential segregation, the city fabric mitigates a more homogeneous mixing of people.
These normalised stratification matrices reveal further characters of possible biases of people in choosing places to visit, out of their own class. If in a city people exhibit upward visiting biases, thus they tend to choose more expensive places to visit when they step out of their own class, the upper diagonal matrix elements of \(N_{i,j}\) would appear dominantly red. While, if the opposite is true, the lower diagonal elements would reflect similar but downward visiting biases. To simply quantify these patterns, we compute the average values \(N_{i,j}\) elements of normalised stratification matrix of cities above, at, and under their diagonals. From Fig. 2(g) it is clear that, in all cities, diagonal elements dominantly concentrate visiting probabilities. However, in terms of off-diagonal averages, in most of the cities (like in Houston in Fig. 2(d)) the upper diagonal average takes a larger value as compared to the lower diagonal average, indicating present upward visiting biases in these metropolitan areas. Meanwhile, in some cases the contrary is true (like in San Diego in Fig. 2(f)) or in some cities these averages are very similar thus indicating no dominant upward or downward visiting biases, as in case of New York (see Fig. 2(e) and Fig. 2(g)).
3.2 Individual bias
The matrix measures presented in Fig. 2 reflect the coexistent socioeconomic configurations derived from visit trajectories. Firstly, the empirical stratification matrices \(M_{i,j}\) bring an initial indication of homophily mixing as seen in the dominant frequency visit within own class. Secondly, these results reveal the underlying inclination in visiting places situated in higher SES as depicted by the larger proportion of upper diagonal elements in the normalised stratification matrices \(N_{i,j}\) in most of the cities. Taking these two configurations into account, it can be inferred that while individual mobility is dictated by the membership of socioeconomic class most of the time, the embedded motivation to visit upper class places is still present.
We take a technical step ahead in order to adequately quantify this visiting bias that indicates deviations in mixing from the respected \(c_{u}\) socioeconomic class of an individual. We compute a single empirical individual bias score for each individual \(u\in U\) as
$$ B_{u}= \langle c_{p}\rangle _{u} - c_{u}, $$
(3)
where \(\langle c_{p}\rangle _{u}= \frac{\sum_{p\in P} w_{u,p}\times c_{p}}{n_{p}^{u}}\) is the average socioeconomic status of places an individual u visited, defined as the fraction of the \(\sum_{p\in P} w_{u,p}\times c_{p}\) sum of socioeconomic status of places in the trajectory of individual u and the \(n_{p}^{u}=\sum_{p\in P} w_{u,p}\) number of times individual u visited any places. An individual has upward visiting bias if her individual score \(B_{u}\) is positive, meaning that she tends to visit places located in more affluent areas than where she lives. Secondly, an individual with negative score value has downward visiting bias since places she usually visits are situated in lower socioeconomic class than her own. Otherwise, an individual does not have any indication of bias (\(B_{u}=0\)) if she visits places within her own socioeconomic rank. A reference model for this measure can follow a similar logic as the in-class randomisation for the realisations of network reference models explained before. Given the individual trajectory resulted from the random visit generating process, we calculated a randomised individual bias score using the same formula as in Eq. (3). Note that in this measure boundary effects may appear, as people from the poorest class cannot exhibit downward bias, and similarly, the highest class cannot be upward biased. Individual bias scores can be fairly compared to null models, which retain these boundary effects. In-class randomisation fulfils this requirement, providing an average randomised bias score \(\langle B^{\mathrm{rand}} \rangle _{u}\) for each individual separately. Note, that the randomised individual bias scores take non-trivial values, different from zero, due to the individual variance of visiting frequencies of individuals to different places. These are represented by the weights \(w_{u,p}\) in the bipartite network, which are preserved during the randomisation process.
The comparison of the empirical and in-class normalised individual bias scores can be best quantified by an individual bias z-score as
$$ z_{u}^{B_{u}}= \frac{B_{u}-\langle B^{\mathrm{rand}} \rangle _{u}}{\sigma _{u}^{B^{\mathrm{rand}}}}, $$
(4)
where \(\langle B^{\mathrm{rand}} \rangle _{u}\) is the mean and \(\sigma _{u}^{B^{\mathrm{rand}}}\) is the standard deviation of the randomised individual bias scores across 100 independent realisations of the null model. The value of \(Z_{u}^{B_{u}}\) reflects how much the individual bias deviates from the expected bias for an individual who chooses places to visit with the same frequency as before but selects them from a given set of places dictated by others within the same socioeconomic class.
The class distributions of individual z-scores together with their median values are shown in Fig. 3, where the unbiased level is assigned as a flat red line. These distributions appear broad for each class, indicating that actually people from any class exhibit upward or downward biases in terms of their visiting patterns to other socioeconomic classes. Interestingly, the median z-scores indicate an increasing trend in all the three depicted cities. The people from lower classes appear with slightly negative bias z-score, meaning they have a slightly weaker bias to visit places of different socioeconomic classes than expected from their random visiting patterns. In contrary, the middle and upper classes are evidently biased stronger than expected. This increasing trend of the median of the individual bias z-score with socioeconomic classes surprisingly characterises all the investigated cities as shown in Section C and in Fig. 8 in the Additional file 1.
3.3 Class-level bias
The individual bias score \(B_{u}\) compares the average class of visited places of an individual to its own socioeconomic rank inferred from its home location. Meanwhile, its z-score \(z_{u}^{B_{u}}\) indicates if this individual bias is weaker or stronger than expected from random behaviour. However, this measure is using the class label of the individual as a reference of comparison, and it says less about whether an individual visits higher or lower class places as compared to the random expected behaviour characterising other individuals in its own class. To directly measure this effect we introduce a class level z-score measure
$$ z_{u}^{c_{u}}= \frac{\langle c_{p} \rangle _{u} - \langle c_{p} \rangle _{c_{u}}^{\mathrm{rand}}}{\sigma _{c_{u}}^{\mathrm{rand}}}, $$
(5)
where \(\langle c_{p} \rangle _{u}\) is the average socioeconomic status of places individual u visited, and \(\langle c_{p} \rangle _{c_{u}}^{\mathrm{rand}}\) and \(\sigma _{c_{u}}^{\mathrm{rand}}\) are the average and standard deviation (respectively) of class of places that others from class \(c_{u}\) would visit if behave randomly. This reference measure, just like before, is generated by in-class shuffling to obtain null models over 100 realisations. The value of \(z_{u}^{c_{u}}\) reflects directly how much the individual behaviour deviates from the expected level, when the individual could choose randomly places to visit from a given set dictated by others from the same socioeconomic class.
Results in Fig. 4 show a different behaviour as compared to the individual bias scores. In case of New York (see Fig. 4(b)), the distributions of the class level bias z-scores indicate that, although the variation is large in each socioeconomic classes, the medians of these distributions are all slightly positive and independent of the socioeconomic class. This signals a weak upward bias in people’s visiting patterns in New York as compared to the class behaviour that appears for each class. In other cities, we find several other bias patterns during our analysis (see Additional file 1, Section C Fig. 10). In case of San Diego (in Fig. 4(c)) class-level biases are all positive and evidently increasing with the socioeconomic classes. This suggests that richer people in San Diego may visit even more affluent places, than one would expect from their random class behaviour. Somewhat the opposite trend can be observed for Houston (Fig. 4(a)), where although the class-level bias z-score is always positive and indicates upward bias for each class, it seems to follow an overall decreasing trend.
Visiting patterns measured by the class-level bias scores suggest that an upward socioeconomic bias characterise each cities we study. Although these measures incorporate the visiting frequency distribution of individuals, they do not show evidently that upward biases typically appear due to repeated visits to places with higher class scores, or due to several occasional visits to places out of ones socioeconomic classes. To answer this question, we recompute the median bias scores, excluding places which were visited less number of times by an individual than a given threshold. Results are depicted in Fig. 5 for each city. As expected, the median class-level bias score appears as a decreasing function of the frequency threshold in each city. This suggests that people visit more frequently places, which are closer in terms of socioeconomic status to their own class, while visit more affluent places occasionally only, that in turn causes upward bias patterns characterising their class. Beyond this general decreasing character, this function indicates large variance between different cities. For example, in case of San Diego (red line in Fig. 5), this curve starts from a high z-score value when all visits are consider but decreases rapidly as repeated visits are taken into account. In case of New York this function starts from a relatively small z-score values and decrease linearly for larger threshold values. This suggests a different visiting behaviour where people typically visit places more than one time, but closer to their own socioeconomic class.
We prefer this particular measure over the one on individual bias, as our objective here is to reveal the source of upward bias, whether it is driven by repeated visits to places with higher class scores, or due to several occasional visits to places out of one’s socioeconomic class. The class-level bias z-score serves this purpose as it already incorporates the visiting frequency distribution of individuals compared to their own class and gives positive z-score values.
3.4 Mobility mixing and segregated residences
While there is an expected relation between the mobility mixing patterns and residential segregation in a city, the combined investigation of these phenomena has not received much attention so far. Their relation is important however for several reasons. For example, due to the multitude correlated socioeconomic factors it is likely that e.g. ethnicity, which strongly correlates with income status in US metropolitan areas, correlates also with residential segregation, as it has been shown in several studies [4, 45]. On the other hand, the daily mobility of people and their visiting patterns to different places are constrained also by these socioeconomic factors, thus they are likely to resemble similar segregation patterns. To investigate these correlations, we focus on different ethnic groups and the likelihood of their mixing in cities, which exhibits different level of mobility segregation patterns.
To quantify the level of segregation in mobility mixing, we analyse the earlier introduced normalised stratification matrix \(N_{i,j}\) for each city. As we have discussed, signatures of segregation can be associated to strong diagonal elements in these matrices, indicating that people of a given SES are the most likely to visit places associated with the same or similar SES, as compared to random visiting patterns. To quantify the strength of diagonal concentration of visiting probabilities, we measure the diagonality index of the normalised stratification matrices [46], which is similar to the assortativity coefficient used by others [5, 47]. It is defined as the Pearson correlation coefficient of matrix entries as
$$ r_{N} = \frac{\sum_{i,j} i j N_{i,j} - \sum_{i,j}i N_{i,j}\sum_{i,j} j N_{i,j}}{\sqrt{\sum_{i,j} i^{2} N_{i,j} - (\sum_{i,j} i N_{i,j} )^{2}}{\sqrt{\sum_{i,j} j^{2} N_{i,j} - (\sum_{i,j} j N_{i,j} )^{2}}}}. $$
(6)
Here \(i\in c_{u}\) indicates the socioeconomic class of individuals and \(j\in c_{p}\) is the same for places. The diagonality index takes values between −1 and 1. In case it is 1, it indicates perfect assortative mixing corresponding to a fully stratified matrix with non-zero elements in its diagonals and zero anywhere else. Cities with large \(r_{n}\) values are characterised by visiting patterns of people who are strictly bounded to places associated to their own socioeconomic class. On the contrary, if \(r_{n}\) takes smaller than zero values (in extremity \(r_{n}=-1\)), it indicates dis-assortative connections between people and places of different socioeconomic status. This corresponds to mobility mixing patterns where people prefer to visit places of different SES rather than places from their own class. In case \(r_{n}=0\), the normalised stratification matrix is flat indicating no choice preferences of people to visit places with particular SES.
The mixing patterns in a city may not be only determined by the socioeconomic status of people but also by residential segregation. Residential segregation is strongly correlated with the ethnicity of people [45, 48, 49], which in turn, according to Wang and others [29], is an even stronger predictor of mobility mixing than socioeconomic status when it turns to black, Hispanic, and white poor and non-poor populations. This study find that the minority groups – despite their socioeconomic status – have lower exposure to richer or white neighbourhoods, comparing to poor white groups. The fact that they travel across similar distance and frequency to many places, does not change the persistent pattern of their isolation and segregation. Therefore, racial segregation emerges from a higher-order level, not limited to their residential neighbourhood but expanded to their mobility and potential contact.
To address the effects of residential segregation on mobility mixing, we took a similar path than others [29, 50] and considered the ethnic group distribution in a city as a proxy. Residential segregation is indicated by housing clustering tendency of individuals from the same ethnic group. This can be formally quantified by the so-called distance decay isolation [51], which measures the probability that a racial group minority interacts with members of their own group by considering the distance from the racial group minority’s housing area. This is measured as:
$$ Dp_{xx*} =\sum_{i=1}^{n} \Biggl( \frac{x_{i}}{X}\sum_{j=1}^{n} \frac{k_{ij}x_{j}}{t_{j}} \Biggr), $$
(7)
where \(x_{i}\) and \(x_{j}\) are the population sizes of a minority group in census tracts i and j (respectively), \(X=\sum_{i} x_{i}\) is the total population of the minority group, and \(t_{j}\) is the total population of census tract j. The distance decay dimension is reflected by \(k_{ij} = \frac{t_{j}^{-d_{ij}} }{\sum_{j=1}^{n}t_{j}^{-d_{ij}}}\), where \(d_{ij}\) is the distance between the centroids of census tracts i and j. Hence, higher index suggests higher probability of interaction with people from the same group, inferring isolation from the rest of population. In our case, we use a probabilistic individual profiling to identify the most likely socioeconomic profile of an individual based on the ethnic group with the highest proportion at the respected census tract where one lives. For instance, if an individual u lives at census tract i where the racial composition there is 60% white, 15% Hispanic, 10% black, and 5% Asian, this individual is considered as white. We consider different thresholds at first and we find out that considering a neighbourhood the ethnicity if such people consist of at least the 30% of the given tract is the optimum cut-off because it is the highest threshold with the lowest unidentified census tract ethnicity profiles.
Recalling the above mentioned diagonality index and individual bias, we take the average of z-score of each of these bias measures at the level of ethnic groups in every urban area and correlate them with their distance decay isolation value computed for the same ethnic group in the same city. By considering four ethnic groups (White, Hispanic, Black and Asian) we receive four data points for each cities as shown in Fig. 6. Although the total number of analysed individuals are not proportional to the total population of each city, the in-city fraction of different ethnic groups are similar to the census distributions.
There is a striking correlation emerging between the diagonality index (quantifying assortativity mobility mixing of each ethnic groups) and the distance decay isolation (measuring the isolation of different ethnic groups) with \(R=0.35\) (\(p=0.0\)). Notably, almost all diagonality index measures appear with positive values suggesting assortative mixing for most ethnic groups, with a few exceptions. Further, the overall correlation suggests the intuitive picture that the stronger mobility mixing stratification patterns characterises a city (i.e. larger its corresponding diagonality index), the stronger isolation patterns emerge between its ethnic groups. In turn, it indicates that residential segregation (and thus physical proximity) play an important role in determining visiting and mixing patterns of people in a city. More interestingly, the de-coupled ethnic groups for each city show an emerging clustering, which assigns the importance of racial differences in mobility segregation. From Fig. 6(a) it appears that people belonging to the white ethnic group (shown as orange points in Fig. 6(a)) appear to be the most isolated from the rest of the population (with the largest values of \(Dp_{xx*}\)), while they appear with the strongest assortative mobility segregation patterns too (with the highest diagonality indices) consistently in several cities (to lead the eye we coloured this group as an orange blob in Fig. 6(a)). The contrary is true for the members of the Asian ethnic groups (indicated by red points and blob in Fig. 6(a)). In most cities they appear as the least isolated and the most dis-assortative (least stratified) ethnic group, thus mixing well with the rest of the population. In between these two groups, people from the Hispanic ethnic group (green points and blob) seem to be more isolated than people from the black ethnic group (purple points and blob) although they show comparable strength of segregation in mobility mixing, all weaker than white people.
Needless to say that the grouping patterns shown in Fig. 6(a) indicate overall trends only, while several exception exists for each ethnic group. For example, the Hispanic ethnic group of Charlotte appears with the strongest assortative pattern, although this group is not strongly isolated from the rest of the population. Or the black community of Austin appears with the lowest diagonality index, suggesting a strong dis-assortative mixing of these people with the rest of the population, while they also appear as one of the least isolated among any other communities.
A similar positive correlation appears in Fig. 6(b) with \(R= 0.25\) (\(p=0.04\)) between the average individual bias z-score and distance decay isolation values over all the investigated ethnic groups and cities. Moreover, ethnic groups show certain clustering trends, which suggest ethnic trends in terms of visiting bias patterns. Interestingly, white ethnic groups (shown by orange points and blob), who we have already found the most isolated, show the strongest upward bias to visit more affluent places then their own socioeconomic class. As high SES classes are populated mostly by white people, this pattern derives from our earlier observations in Fig. 3, where we find upward bias to increase with the SES of people. Their high isolation score can be explained by their upward bias towards higher socioeconomic places, which are most likely to be visited by other white people. Meanwhile, it may also indicate that our data have an over-represented white population, as we find upward biases in all of the cities. Strikingly, other racial groups indicate negative visiting biases and lover level of isolation. This effect is the strongest for people from black racial groups all over the country, but also characterises Hispanic and Asian communities although they show more unbiased patterns, with average individual z-score values closer to 0.
Exceptions are again interesting. The Hispanic community of Washington appears as the most upward biased ethnic group, while the black ethnic group of Seattle sits on the other end of the spectrum and being the most downward biased minority among the analysed cities. Both of these communities appear with low level of isolation. Consequently, similar to the conclusion of Wang et al. [29], we observe that beyond socioeconomic status, ethnicity (strongly correlated with residential segregation) is another very important factor determining mixing patterns of people.