Understanding the interplay between social and spatial behaviour

According to personality psychology, personality traits determine many aspects of human behaviour. However, validating this insight in large groups has been challenging so far, due to the scarcity of multi-channel data. Here, we focus on the relationship between mobility and social behaviour by analysing trajectories and mobile phone interactions of $\sim 1,000$ individuals from two high-resolution longitudinal datasets. We identify a connection between the way in which individuals explore new resources and exploit known assets in the social and spatial spheres. We show that different individuals balance the exploration-exploitation trade-off in different ways and we explain part of the variability in the data by the big five personality traits. We point out that, in both realms, extraversion correlates with the attitude towards exploration and routine diversity, while neuroticism and openness account for the tendency to evolve routine over long time-scales. We find no evidence for the existence of classes of individuals across the spatio-social domains. Our results bridge the fields of human geography, sociology and personality psychology and can help improve current models of mobility and tie formation.


Introduction
Our social and spatial behaviour are shaped by both internal and external constraints. On one hand, external factors [1] such as time, cognition, age or the need for food constrain our possibilities. On the other hand, we are driven by internal needs, purposes and preferences. Specifically, within personality psychology, it has been conjectured that personality traits play a key role in shaping our choices across various situations [2,3].
In the social realm, individuals cope with cognitive and temporal constraints by establishing and maintaining connections in a distinctive [4,5] and persistent [4] manner. For example, the size of an individual's social circle is bounded under ∼ 150, the so-called Dunbar number [6], but varies among individuals around this limit [7]. These differences result from an interplay between physical and extrinsic factors such as gender [8], age [9] and socio-economic status [10] as well as from stable individual dispositions underlying personality [11].
Spatially, individuals are characterised by an activity space of repeatedly visited locations within which they move during their daily activities [12], but this geo-spatial signature varies in size [13] and spatial shape [14]. However, unlike the social case, the conjecture that individuals' spatial behaviour is persistent in time [15] had not been verified until recently.
Here, we capitalise on the recent discovery that the size of the activity space is conserved and correlates with the social circle size [16] to test the conjecture that the same personality dispositions in part determine social and spatial behaviour. We test this theory by analysing two long-term datasets consisting of ∼ 1000 individuals mobility trajectories and their phone interactions (for previous studies see section '' below).
First, we test the hypothesis that the strategies individuals adopt in order to choose where to go and with whom to interact are similar. Then, we identify and characterise the prevailing socio-spatial profiles appearing in the datasets. Finally, we show that socio-spatial profiles can be partially explained 1 arXiv:1801.03962v2 [physics.soc-ph] 9 Nov 2018 by the widely adopted big-five personality trait model, often used to describe aspects of the social and emotional life [7,11,[17][18][19][20][21][22]. In the section '', we review the relevant literature; in '' we describe data collection and pre-processing, and we provide details of the methods implemented; in '' we present our findings.

State of the art
Individual-level variability in social and spatial behaviour has mostly been investigated in isolation so far, with few notable efforts to reconcile the two. Here, we briefly review the empirical findings in the two domains.

The social domain
Individuals deal with limited time and cognitive capacity resulting in finite social networks [6,23] by distributing time unevenly across their social circle [4,[24][25][26][27][28]. While this is a shared strategy, there is clear evidence for individual-level variation. First, social circles vary in terms of diversity: they differ in size [7] -within a maximum upper-bound of ∼ 150 individuals [6] -and in structure [4,29]. Second, individuals display different attitudes towards exploration of social opportunities as they are more or less keen on creating new connections [30][31][32][33]. Finally, individuals manage social interactions over time in different ways. Some are characterised by high level of stability as they maintain a very stable social circle, while others renew their social ties at high pace [5].

The spatial domain
Constraints including physical capabilities, the distribution of resources, and the need to coordinate with others limit our possibilities to move in space [1]. Individuals cope with these limitations by allocating their time within an activity space of repeatedly visited locations [47], whose size is conserved over several years according to a recent study based on high-resolution trajectories [16], and previous ones based on unevenly sampled and low spatial resolution data [48,49]. The activity space varies across individuals in terms of size [16] and shape [14]: it was shown that two distinct classes of individuals, returners and explorers, can be identified based on their propensity to visit new locations, similarly to the social domain [5]. Heterogeneities in spatial behaviour can be explained in terms of gender [50], age [51,52], socio-economic [35,53] and ethnic [54] differences. There has only been sporadic efforts to include personality measures in geographic research, despite the strong connections between the two [55]. Recent works [44,56] suggest that spatial behaviour can be partially explained from personality traits. However, in [56], this understanding is based on biased data collected from location-based social networks. In [44], the connection between spatial behaviour and personality is not investigated extensively, as it is not the main focus of the study.

Social and spatial connection
Recently, connections between the social and spatial behaviour of pairs [57][58][59][60][61][62] and groups [63] of individuals have been demonstrated, and used to design predictive models of mobility [58,64,65] or social ties [59,[66][67][68]. Shifting the attention to the individual level, recent works based on online social network data [69,70], mobile phone calls data [62] and evenly sampled high resolution mobility trajectories [16] have shown correlations between the activity space size and the ego network structure, calling for further research to more closely examine the connections between social and spatial behaviour at the individual level.  Table 1: Characteristics of the mobility datasets considered. N is the number of individuals, δt the temporal resolution, T the duration of data collection, δx the spatial resolution, T C the median weekly time coverage, defined as the fraction of time an individual's location is known.

Data description and pre-processing
Our study is based on 850 high resolution trajectories and call records of participants in a 24 months longitudinal experiment, the Copenhagen Networks Study (CNS) [71]. Results on the connections between social and spatial behaviour were corroborated with data from another experiment with fixed rate temporal sampling, but lower spatial resolution and sample size: the Lausanne Mobile Data Challenge (MDC) [72,73], lasted for 19 months (see Table 1).

CNS dataset
The Copenhagen Networks Study (CNS) experiment took place between September 2013 and September 2015 [71] and involved ∼ 1000 Technical University of Denmark students (∼ 22% female, ∼ 78% male) typically aged between 19 and 21 years old. Participants' position over time was estimated combining their smart-phones WiFi and GPS data using the method described in [16,74]. The location estimation error is below 50 meters in 95% of the cases. Participants' calls and sms activity was also collected as part of the experiment. Individuals' background information were obtained through a 310 questions survey including the Big Five Inventory [75], which measures how individuals score on five broad domains of human personality traits: openness, conscientiousness, extraversion, agreeableness, neuroticism. The personality questionnaire used in the study is a version of the Big Five Inventory [75], translated from English into Danish. It contains 44 individual items and each trait is computed as the average of 7-10 items. Data collection was approved by the Danish Data Protection Agency. All participants provided individual informed consent. Mobility patterns of participants in the CNS experiment display statistical properties consistent with previous literature [13], as shown in [16].

MDC dataset
Data was collected by the Lausanne Data Collection Campaign between October 2009 and March 2011. The campaign involved an heterogeneous sample of ∼ 185 volunteers with mixed backgrounds from the Lake Geneva region (Switzerland), who were allocated smart-phones [73]. In this work we used GSM data, that has the highest temporal sampling. Following Nokia's privacy policy, individuals participating in the study provided informed consent [73]. The Lausanne Mobile Data Challenge experiment involves 62% male and 38% female participants, where the age range 22-33 year-old accounts for roughly 2/3 of the population [76].

Metrics
In this section, we define the concepts and metrics used to quantify the social and spatial behaviour of an individual i. Exploration behaviour is characterised by the following quantities: Number of new locations/week: n loc (i, t) is the number of locations discovered by i in the week preceding t.

Activity space
Social circle |r(j, t) − r(j, t − T )| * * N * Here T = 20 weeks, see Supplementary Material for the analysis with T = 30 weeks * * r( k , t) and r(u k , t) denote the rank of a location k and individual u k at t, respectively Table 2: Definition of the metrics characterising the activity space and the social circle. 1) The size of a set is the number of elements in the set 2) We compute the entropy of a set considering the probability p(j) associated to each element j of the set. 3) We measure the stability J AS by computing the Jaccard similarity between the activity space at t and at t − T , with T = 20 weeks. J SC is computed in the same way for the social circle. 4) We compute the rank turnover of a set by measuring for each of its elements j the absolute change in rank between two consecutive time windows of length T = 20 weeks. The rank is attributed based on the probability p(j). The average absolute change in rank across all elements corresponds to the rank turnover. .
Number of new ties/week: n tie (i, t) is the number of individuals who had contact with i (by sms or call) for the first time in the week preceding t.
Note that locations/ties are considered 'new' only if discovered after 20 weeks from the beginning of data collection.
Exploitation behaviour can be quantified by considering: Activity space: The set AS(i, t) = { 1 , 2 , ..., j , ... C } of locations j that individual i visited at least twice and where she spent a time τ j larger than 200min during a time-window of T = 20 weeks preceding time t (see Supplementary Material for the analysis with T = 30 weeks). Among the locations in the activity space, i visited j with probability p( j ) = τ j / τ j . (It is worth noting that this time-based definition of activity space includes all significant locations independently of their spatial position and it is only loosely connected with space-oriented definitions widespread in the geography literature such as the "standard deviational ellipse" and the "road network buffer" [77]).
Social circle: The set SC(i, t) = {u 1 , u 2 , ..., u j , ...u k } of individuals u j with whom individual i had a number of contacts n j > 5 by sms or call during a time-window of T = 20 consecutive weeks preceding time t (see Supplementary Material for the analysis with T = 30 weeks). The probability that i has contact with a given member u j of her social circle is p(u j ) = n j / n j .
For these two sets AS(i, t) and SC(i, t), we consider their sizes C(i, t) and k(i, t), quantifying the number of favoured locations and social ties, respectively; their entropies H AS (i, t) and H SC (i, t), measuring how time is allocated among locations and ties; their stabilities J AS (i, t) and J SC (i, t), quantifying the fraction of conserved locations and ties, respectively, across consecutive non-overlapping windows of T = 20 weeks (see Supplementary Material for T = 30); their rank turnovers R AS (i, t) and R SC (i, t) measuring the average absolute change in rank of an element in the set between consecutive windows. The mathematical definition of these quantities is provided in Table 2 Other metrics In order to compare the difference in entropy between two different sets, we compute their Jensen-Shannon divergence (JSD). The JSD between two sets P 1 and P 2 is computed as JSD(P 1 , P 2 ) = H( 1 2 (P 1 + P 2 )) − 1 2 [H(P 1 ) + H(P 2 )] (see also [4]).

Results
Both in their spatial and social behaviour, individuals are constantly balancing a trade-off between the exploitation of familiar options (such as returning to a favourite restaurant or spending time with an old friend) and the exploration of new opportunities (such as visiting a new bar or going on a first date) [78]. We adopt this exploration-exploitation perspective to analyse the relationship between social and spatial strategies in our dataset [16]. We quantify the propensity for exploration and exploitation within each individual, i, using the metrics reported in Table 3, Fig. 1 and described in section ''. We focus on two aspects of exploitation, (i) diversity, characterising how individuals allocate time among their set of familiar locations and friends, and (ii) evolution, characterising the tendency to change exploited locations and friends over time.   Table 3: Metrics characterising social and spatial behaviour. The metrics are defined in section .
Exploration and exploitation are persistent in time. First, we verify that individual behaviour is persistent in time. For all the aforementioned measures, we compare the individual self-variation across time d self (i) with a reference difference d ref (i, j) between individuals i and j. In the case of the activity space size, for example, self-variation is measured as for most j, we can conclude that for individual i, fluctuations of the activity space size are negligible compared to the difference with other individuals. The same procedure is followed for all metrics with an adjustment in the case of entropies: The persistence of the entropy H AS is verified by comparing the Janson-Shannon divergences d self = JSD(AS(i, t), AS(i, t − T )) and d ref = JSD(AS(i, t), AS(j, t)). The same method was used for H SC (see Methods and [4]).
Results from the CNS dataset reported in Table S3 show that for all metrics d self (i) < d ref (i, j) holds in more than 99% of cases on average (MDC: 97%, see Supplementary Material Table S1). Moreover, the average self-variation across the population d self is consistent with d self = 0 within errors, and d self significantly smaller than the average reference difference d ref (see Tables S3 and S1 in Supplementary  Material).
Social circle size, k 0.04 ± 0.09 12 ± 5 99% Activity space size, C 0.04 ± 0.07 7 ± 3 99% New ties/week, n tie 0.05 ± 0.10 0.9 ± 0.5 96% New locations/week, n loc 0.10 ± 0.17 Activity space rank turnover, R AS 0.04 ± 0.10 2 ± 1 99% Table 4: CNS dataset: Persistence of social and spatial behaviour.For each of the social and spatial metrics, d self is the average self-distance and d ref is the reference distance between an individual and all others, averaged across individuals. The third column reports the fraction of cases where , averaged across the population.
These results extend previous findings [4,16] and suggest that each individual is characterised by a distinctive socio-spatial behaviour captured by the ensemble of these metrics averaged across time. In fact, these averages are heterogeneously distributed across the samples considered (see Fig. S2). way to test the interdependency between social and spatial behaviours is measuring the correlation between a given social metric and a corresponding spatial one. We find positive and significant correlations for all metrics and datasets (see Figs. S3 and S1 in Supplementary Material).  We find that individuals with high propensity to explore new locations are also more keen on exploring social opportunities (see Fig. S3A). Those with diverse mobility routine are also likely to have a correspondingly large social circle (see Fig. S3B), and those that often replace social ties, have also an unstable set of favourite locations (see Fig. S3C and D).
We verify that the observed correlations are not spurious by performing multiple regression analyses that control for other possible sources of variation: gender, age, and time coverage (the average time an individual position is known). We implement five multiple linear regression models M1, M2, M3, M4 and M5. Each regression model predicts a given spatial metric (the activity space size C, the activity space entropy H AS , the number of new locations/week n loc , the activity space stability J AS and the rank turnover R AS ) using the corresponding social metric and the control variables (age, gender and time coverage) as regressors. The relative importance of each regressor is assessed using the Lindeman, Merenda and Gold (LM G) [79] method. Results obtained via weighted least square regression (see Tables S4 for the CNS dataset and S2 in Supplementary Material for the MDC dataset) reveal that the social metrics are significant predictors for spatial metrics (p value< 0.01 in all cases except for M4 in the MDC dataset), and they typically have more importance than factors such as gender, time, coverage and age group (see Fig. S4).
Among the control variables, gender is a significant predictor of spatial behaviour in the CNS dataset: Females display higher level of routine diversity and propensity towards exploration, in accordance with [80]. Time coverage, measuring the fraction of time an individual position is known, plays a significant role in explaining spatial entropy and activity space stability, since individuals who spend long time in the same place (or leave their phone in the same place) are more easily geo-localised. Age differences are not present within the sample of students participating in the CNS study, and they are not estimated to be relevant with respect to spatial behaviour in the MDC study.
We do not identify distinct classes of individuals. A natural question is whether or not, in 8  the samples considered, there is evidence for distinct classes of individuals based on their socio-spatial behaviour [5,14]. We approach this problem by reducing the set of metrics to a smaller number of uncorrelated variables by applying Principal Component Analysis [81,82]. The principal components represent the data through linear combinations of the original variables: In Table 6 we report the percentage of variance in the data explained by all components; in Table 7 we report the coefficients w describing how the original variables are linearly combined to obtain the first two principal components.
In both datasets, we find that the first principal component (PC 0) explains ∼ 40% of the differences between individuals (see Table 6). For the CNS dataset, the variables contributing the most to PC 0 (e.g. such that w 2 > 0.1) are, in order, the activity space size C, the social circle size k, the number of new locations/week n loc , the activity space entropy H AS and the number of new ties/week n tie . n loc and n tie characterise the attitude towards exploration. The other metrics (C, k and H AS ) are related to routine diversity, or the tendency to dispose of a large set of familiar locations and friends. Since the sign of w is the same for all the metrics above, we can conclude that individuals with higher propensity towards exploration tend to have a more diverse social and spatial routine, and vice-versa. Similar conclusions could be drawn by looking at results obtained for the MDC dataset.   The second principal component (PC 1) accounts for ∼ 15% of the total variation (see Table 6). It is dominated by the social circle stability J SC (CNS: w 2 = 0.21, MDC: w 2 = 0.52) and the activity space stability J AS (CNS: w 2 = 0.24, MDC: w 2 = 0.26) for both datasets (see Table 7). The sign of the coefficients w for J SC and J AS are the same, further confirming that these two metrics are correlated (see also Fig. S3). We can conclude that the second principal component accounts for the effects of routine   [84] evolution, or the tendency to change familiar locations and friends over long time scales. We consider the first two principal components, PC 0 and PC 1, to reduce the effects of noise and we test the hypothesis that there exists different classes of individuals applying the gap statistic method [83]. We apply it by looking at the gap between the within-cluster dispersion expected under a uniform distribution of the data and the dispersion obtained after applying K-means. For all possible choices of K > 1, we find that the gap is not large enough to support the existence of more than one class of individuals.
The big-five personality traits partly explain spatial and social behaviour. We verify if the differences between individuals can be explained by the Big five personality traits model [75], typically used to describe social and emotional life (see Table 8). We build two multiple linear regression models that use the Big five personality traits as regressors and one of the principal components describing sociospatial behaviour as target. Results, shown in Table 9, show that three personality traits, neuroticism, openness and extraversion, are relevant predictors for socio-spatial behaviour. In particular, extraversion is the most important predictor of the first principal component: it positively correlates with the tendency to diversify routine and to explore opportunities. Neuroticism and openness explain instead the second principal component, since it correlates with the tendency to change routine over time (see also Fig. S5).
Finally, we perform all analyses considering only spatial metrics. Results are in line with those obtained considering all metrics: The first two principal components account for a large fraction of the variability in the data (see Table 10); The first component is dominated by the activity space size C, the number of new locations/week n loc and the activity space entropy H AS , while the second is mostly controlled by the activity space stability J AS (Table 11). For the CNS dataset, extraversion is the most important predictor of the first principal component, while openness, extraversion and neuroticism account for the second component (see Table 12 and Fig. S6) Table 9: Extraversion, openness, and neuroticism explain socio-spatial behaviour. The result of a multiple linear regression explaining principal components of socio-spatial data (see Table 7). The value of each coefficient (coeff) is reported together with the probability (p val) that the coefficient is not relevant for the model. The relative importance of each coefficient (LMG) is computed using the LMG method [79]. PC 2: routine evolution Figure 5: Relative importance of personality traits for socio-spatial behaviour LMG of each regressor computed using the Lindeman, Merenda and Gold method [79] for the multiple regression model of the principal components (see also   Figure 6: Relative importance of personality traits for spatial behaviour LMG of each regressor computed using the Lindeman, Merenda and Gold method [79] for the multiple regression model of the principal components (see also     Table 12: Extraversion, openness, and neuroticism explain spatial behaviour. The result of a multiple linear regression explaining principal components of spatial data (see Table 7). The value of each coefficient (coeff) is reported together with the probability (p val) that the coefficient is not relevant for the model. The relative importance of each coefficient (LMG) is computed using the LMG method [79].

Discussion
Using high resolution data from two large scale studies, we have investigated the connection between social and spatial behaviour for the first time. We have shown that, in both domains, individuals balance the trade-off between exploring new opportunities and exploiting known options in a distinctive and persistent manner. We have found that, to a significant extent, individuals adopt a similar strategy in the social and spatial sphere. These strategies are heterogeneous across the two samples considered, and there is no evidence suggesting that there exist distinct classes of individuals. Finally, we have shown that the big five personality traits explain related aspects of both social and spatial behaviour.
In particular, we have found that extraverted individuals are more explorative and have diverse routines in both the social and the spatial sphere while neuroticism and openness associate with high level of routine instability in the social and spatial domain.
Our findings confirm the usefulness of mobile phone data to study the connections between behaviour and personality [29,40,44,[85][86][87]. The results are in line with previous findings on the relation between personality and social behaviour: extraversion correlates with ego-network size [18,41,43] and diverse composition [88], openness to experience to social network turnover [29] and neuroticism does not correlate with social network size [11]. Furthermore, our findings establish a relation between personality and spatial behaviour, validating the theories suggesting that spatial choices are partially dictated by personality dispositions [15] and that a single set of personality traits underlies many aspects of a person's behaviour [2,3].
Our findings on the connection between spatial behaviour and personality are consistent with the existing literature on personality. The correlation between exploration and extraversion could be explained by the fact that extraverted individuals are more likely to be risk-takers in various domains of life [89]. Extraverted individuals are also generally more likely to engage in social activities [90], which could partially explain why they allocate time among a larger set of locations. Furthermore, the key finding that individuals who score high in neuroticism and openness display a tendency to change familiar locations, and friends, over time fits well within the existing picture. In the case of neuroticism, it is well known that this trait is closely related with 'stability' [91], such that the trait of neuroticism is sometimes referred to as (low) 'emotional stability' [92]. Also, at the core of neuroticism is the tendency to experience negative emotions [93] including dissatisfaction [94], which in turn can lead into desire for change [95,96]. Finally, it is known that people scoring high in neuroticism have a larger number of weak ties [42] and perceive that they tend to have less social support [97,98], in line with our observation that they dispose of an unstable ego-network. Openness to experience has been shown to correlate with 'disloyal' behaviour also in other contexts such as politics [99] and shopping [100]. Our results, in agreement with previous studies on social [29,40,44,[85][86][87] and online [101][102][103][104] behaviour, show that personality traits explain only partially how individuals behave in specific situations [105].
As a final point, we emphasize that the individual characterisation of spatial behaviour and connections with personality are fundamental to develop conceptual [55] and predictive [106] models of travel behaviour accounting for individual-level differences.
Supplementary Material for Understanding the interplay between social and spatial behaviour 1 Results obtained with the MDC dataset Tables S1, S2 and Fig. S1 report the results of the persistence analysis, the multiple regression analysis, and the correlation analysis for the MDC dataset.
Social circle size, k 0.05 ± 0.13 10 ± 5 97% Activity space size, C 0.07 ± 0.12 8 ± 3 97% New ties/week, n tie 0.2 ± 0. Activity space rank turnover, R AS 0.2 ± 0.6 2 ± 1 97%   Activity space rank turnover, R  , S6 and Tables S3, S4, S5, S6, S7, S8, S9, S10 report the results obtained choosing a time-window with length T = 30 weeks (see main manuscript, section 'Methods').   (table S10).    Table S10: T=30, Extraversion, openness, and neuroticism explain spatial behaviour. The result of a multiple linear regression explaining principal components of spatial data (Table S6). The value of each coefficient (coeff) is reported together with the probability (p val) that the coefficient is not relevant for the model. The relative importance of each coefficient (LMG) is computed using the LMG method.