Regular article  Open  Published:
Urban groups: behavior and dynamics of social groups in urban space
EPJ Data Sciencevolume 8, Article number: 8 (2019)
Abstract
The tendency of people to form socially cohesive groups that get together in urban spaces is a fundamental process that drives the formation of the social structure of cities. However, the challenge of collecting and mining largescale data able to unveil both the social and the mobility patterns of people has left many questions about urban social groups largely unresolved. We leverage an anonymized mobile phone dataset, based on Call Detail Records (CDRs), which integrates the usual voice call data with text message and Internet activity information of one million mobile subscribers in the metropolitan area of Milan to investigate how the members of social groups interact and meet onto the urban space. We unveil the nature of these groups through an extensive analysis, along with proposing a methodology for their identification. The findings of this study concern the social group behavior, their structure (size and membership) and their root in the territory (locations and visit patterns). Specifically, the footprint of urban groups is made up by a few visited locations only; which are regularly visited by the groups. Moreover, the analysis of the interaction patterns shows that urban groups need to combine frequent onphone interactions with gatherings in such locations. Finally, we investigate how their preferences impact the city of Milan telling us which areas encourage group gettogethers best.
Introduction
The understanding of tightknit social groups represents a key factor in the development of services which integrate contextual information from social and mobility data sources [1]. Besides being a fundamental concept driving many sociological studies, the idea of social groups is central in modern social networking services and instantmessaging applications, e.g. WhatsApp, Snapchat and Skype. This is due to people’s increasing propensity to share images and videos with a restricted group of close friends built around specific interests [2]. The central role of social groups is further emphasized when these groups move around and/or are easily mappable onto locations in a city [3,4,5]. It is the basis of a rich offering of targeted applications and services, e.g. content dissemination of locationaware information useful to a group [6, 7] or recommendation of locations fitting a group’s interests [8, 9]. So, an understanding of the typical traits of social groups in an urban context is mandatory for purposes of developing more personalized locationbased social applications and solutions.
In this paper we focus on urban groups, i.e. cohesive social groups that express their interactions in urban places. Through an extensive analysis, we unveil the nature of these groups and propose a methodology for their identification. Our analysis rests on an anonymized mobile phone dataset based on Call Detail Records (CDRs) over a span of 67 days that integrate the usual voice call data with text message and Internet activity information of one million mobile subscribers in the metropolitan area of Milan. This wealth of data provides us with a unique opportunity to study how social groups interact and meet in an urban space having a large population. In fact, in addition to the reconstruction of more complete social interactions merging call and text communications, the provided mobility information allows us to obtain more detailed user mobility traces. This is a key component for the identification of the colocation of group members.
The contributions of our work can be summarized as follows:

We propose a procedure for the identification of urban groups. The identification approach is applicable to every context providing a graph that expresses both the interactions/communications among users and the users’ mobility traces. Due to its high modularity, the methodology can be employed to discover whatever subgraph expresses the concept of group, as well as to map, when feasible, the group’s activities in urban places. In this latter respect, it finds its favorite locations.

By applying the above procedure, we analyze how urban groups meet and behave within the urban space. We show that these groups meet all the main criteria of what makes for a sociological group, namely: mutuality (i.e. groups are highly dense subgraphs where each one interacts with any other); reachability (i.e. within a group no one is disconnected); interactivity (i.e. urban group members interact with one another frequently, and in large groups they devote much greater efforts to interacting with one another and to maintaining relationships established within the group than they do in small groups).

We provide a characterization of the urban groups by analyzing their size and membership, and we find similarities with modern instant messaging services (e.g. WhatsApp and WeChat). In addition , we also focus on the preferences of urban groups by investigating the places where they meet and the frequency with which they gather. Specifically, we show that, in strict analogy to human mobility, urban groups are characterized by few visited locations; also they need to combine onphone interactions with gatherings in such locations, since the visitation patterns of these locations is regular. Finally, we investigate how their preferences impact the city of Milan. This tells us which areas encourage group gettogethers best.

We also highlight how mobility and interaction information define social roles within urban groups [10]. Specifically, we focus on the identification of leader/follower relations through the visit patterns of the places hosting urban group gatherings. We find a subset of members (the leaders) who take part frequently in the gettogethers, while other members (the followers) play a much more marginal role w.r.t. the urban group activities. The same observation also holds for the frequency of the interactions within a group. In this case, within the largest groups, we identify the presence of a backbone of strong links involving a small subset of group members.

Generally, we show that cellular network data—CDRs—are a feasible and rich source of data to discover and analyze the behavior of social groups, since they capture both social interactions and a mediumgrain mobility needed to identify likely group gettogethers.
The paper is organized as follows. In Sect. 2 we describe the cellular network data by providing an explanation of the social and localization information and then by discussing their advantages/limitations. In Sect. 3 we introduce the procedure for the identification of the urban groups from mobile phone data; as to the details about the single steps and their complexity, we present those in the Appendix. In Sect. 4 we focus on the size and the membership of the urban groups identified by our methodology and report their main spatiotemporal characteristics. Then in Sect. 5 we report our results concerning the preferred locations of the urban groups, the identification of leaders/followers within them, and the presence of strongly interactive relationships within these tightknit groups. In Sect. 6 we discuss urban groups from the metropolitan viewpoint highlighting the city areas which facilitate and support urban group gatherings. Finally, in Sect. 7 we summarize our contributions.
Dataset
We performed our analysis of urban groups by mining a large anonymized dataset of Call Detail Records (CDRs) involving the voice calls, short text messages (SMS) and Internet traffic of about 1 million subscribers of one of Italy’s major mobile operators [11]. The information provided in the database covers the metropolitan area of Milan for a period of 67 days, namely from March 26 to May 31, 2013. During this period approximatively 63 million phone calls and 20 million text messages were exchanged, all of which were recorded in the database. The temporal window covered by the dataset is extensive enough to reconstruct most of the onphone social relationships [12].
Data description
For billing purposes, cellular network operators trace their customers’ activities [13]. So, whenever a user makes a call, sends a text message or accesses the Internet, an entry is recorded in the charging database. Each entry in the CDR is represented by the 6ple $t_{\mathrm{CDR}}= \langle s,r,t_{\mathrm{start}},d,\mathit {loc}_{\mathrm{start}},\mathit{loc}_{\mathrm{end}} \rangle$, where s and r respectively represent the sender’s ID and the receiver’s ID ,^{Footnote 1} $t_{\mathrm{start}}$ is the initial time of the activity (when the call starts or a text is sent or Internet access occurs), d is its duration, and $\mathit{loc}_{\mathrm{start}}$ and $\mathit{loc}_{\mathrm{end}}$ are the serving cells the user s is attached to when the activity gets started and has ended. Depending on the type of activity that has occurred, the information provided is different, so leading to the following uniquenesses: (i) both SMS (i.e. text message) and Internet activity have null duration d; and (ii) Internet activity has the field receiver r set to null.
User’s localization
CDRbased datasets have been adopted extensively in literature to study human mobility patterns [14,15,16,17,18]. All these research projects derive locations by positioning cell towers in geographical areas where each cell tower may cover a zone as wide as a few kilometers. The dataset we are leveraging reports data about cell towers within a city space, where a dense placement of cells (one or very few hundred meters of coverage radius) has adopted. This feature enables quite an accurate localization of users while they are performing their onphone activity. As we will promptly show, the mean cell radius we consider is about 200 meters, or roughly a city block.
As the dataset contains labels assigned to area names, i.e. zones covered by a group of cells but without information about cell size and precise positioning [19], we adopted the following procedure to estimate the effective cell size distribution and position. We assume that each cell $\mathit{cell}_{i}$ is a circle with center $c_{i}$ and radius $r_{i}$. To estimate the center of the cells we use the webservice UnwiredLabs^{Footnote 2} named LocationAPI that provides the cell center along with the estimated error. Currently, we are not using this last data and we assume that the cell center corresponds to the exact position provided by the system. For each cell, $\mathit{cell}_{i}$, $r_{i}$ is half the mean of the Euclidean distances between the center of $\mathit{cell}_{i}$ and the centers of the six closest cells .^{Footnote 3}
As Milan has a radial topology, we consider three city regions: the inner circle of 3 Km radius, corresponding to the city center; a second ring in the range of 3 Km to 4 Km moving outward from the city center; and a third ring, in the range of 4 Km to 5 Km. The inner city circle corresponds to downtown Milan, while the other rings include suburbs. We obtain 538, 143, and 88 cells inside each region, respectively.
Having mapped the cell tower onto the city and computed the cell radius, we analyze the radius as a function of the cell position. The cumulative distribution function (CDF) of the cell radius for each city region is reported in Fig. 1. From the figure it emerges that the radius of the cells increases as we move farther from the city center. In fact, the mean of inner circle, second ring and third ring are 217, 325 and 446 meters, respectively. Given this small coverage radius we are able to provide a good approximation of the mobile users position suitable for the detection of their colocation.
Methodology
Social groups are often identified by the notion of cohesive groups, i.e. subsets of individuals among whom there are frequent and relatively strong interactions. Within these groups, beliefs, interests and ideas are often very homogeneous due to the pressure to achieve uniformity and adhere to group standards exerted by intense interactions [20]. Places figure among the interests of a cohesive group. In fact, shared places encourage the formation and consolidation of social relationships; conversely, groups might choose a specific place as conducive to expressing themselves better.
Combining quite a precise positioning of the customers with their onphone relationships, our mobile phone data enable us to identify and characterize cohesive groups that couple strong onphone interactions with the attitude to share specific urban places where they colocate to perform various social activities, e.g. family, work and leisureoriented ones and/or participatory events. We call them urban groups. So, given a graph expressing the relationships among the operator’s customers, an urban group is identified by a particular subgraph, a quasiclique [21], whose a subset of members colocate at least once. The subset cardinality is governed by the parameter η.
Operationally, to identify the social relationships we leverage the communication activities modeled as a graph. Meanwhile, we exploit the customers’ localization to discover the aggregation in urban spaces. To this end, we perform three steps, namely: interaction graph building, cohesive group identification and colocation filtering. As our final output we obtain the set of urban groups, along with the information of the aggregation events. Our approach differs from previous works which have studied social groups by mining Bluetooth proximity data [22], since in our dataset the interplay between cohesive groups and physical proximity is not immediate and direct as in the Bluetooth case.
Interaction graph building
The purpose of the first step is to reconstruct the network structure of the interactions mediated by both voice calls and text messages. Following the standard approach in literature we represent such a complex structure by a graph whose nodes are customers and whose edges connect two customers who communicate by calls or texts [23, 24].
However, the choice of linking two users depends on the purpose of the communication. In fact, all calls and texts do not have the same social value; this is particularly true in the case of advertisements and commercial messages or communications issued by call centers. Moreover, we have to take into account missing links between other operators’ subscribers since we have full access to the call/text records of one operator but only partial access to calls to/from subscribers of other operators. To cope with the above issues and obtain a graph which models the relationships between the operator’s customers only, we filter out incoming and outgoing communications that involve other mobile operators’ customers,^{Footnote 4} according to the literature on mobile phone cleansing [24,25,26,27]. This way we eliminate the interoperator bias.
After applying the filters, we construct two preliminary graphs, one for each communication channel, from which to extract only the interactions with social relevance [28]. To this end, in the weighted call graph $G_{c}=(V_{c},E_{c})$, we consider the pairs of users whose sum of call durations exceeds one minute and whose total number of interactions is higher than 3 and we store this last value in the attribute $f_{c}$ of the link. In the text message graph $G_{t}=(V_{t},E_{t})$, rather, the only relevant pairs are those with a total number of interactions higher than 3. This value we store in the attribute $f_{t}$. Through the filtering on duration and frequency, we are able to remove accounts/users whose behavior (degree, in/out degree) resembles call centers or customer care services. In the final step we merge $G_{c}$ and $G_{t}$ into the interaction graph G by taking $G_{c}\cup G_{t}$. To keep the information about the number of interactions, for each e in G we sum the attributes $f_{t}$ and $f_{c}$ if $e\in E(G_{c})\cap E(G_{v})$, while we keep the original attribute if e is not in the intersection. We denote the overall number of interactions (strength) in G as w. After the building process, the interaction graph, whose order and size have been reported in Table 1, captures the network among the operator’s subscribers and the strength of their interactions which more likely express social relationships. The interaction graph is the input of the next stage which identifies cohesive groups.
Cohesive group identification
Representation of the onphone communications through an interaction graph allows us to identify cohesive groups of customers, i.e. subsets of users among whom intense, direct and frequent ties do exist. The identification of cohesive groups, which is a central problem in both graph theory and social network analysis, entails different methods—from community detection [25, 29] to enumeration of particular maximal subgraphs [23, 30]. In this work we focus on the latter approach since community detection methods, when applied to this phone graph, have been shown to return loosely connected subgraphs barely interpretable as groups or tightknit communities [31]. In fact, the communities detected by different algorithms are characterized by an average density which varies from 0.019 (Louvain algorithm [25]) to 0.35 (Leung’s algorithm [32]). Such values indicate weak cohesiveness of the members within the communities, whatever the algorithm we used; making the community approach unsuitable for the identification of cohesive groups. Similar conclusions have been reported in [33], where authors claimed that Louvain and InfoMap algorithms applied on phone graphs (weighted or unweighted) yield treelike communities which do not fit well with the notion of social group.
Among the different formalizations of cohesive groups, we adopt a relaxation of the notion of clique, namely the quasiclique, i.e. a particular dense subgraph. The notion of clique well embodies one of the main properties of a cohesive group, i.e. the mutuality. But the completeness of the subgraph is too strict a constraint. In literature many definitions that weaken the notion of clique have been proposed. They range from ncliques or nclubs to kcore [20]. Here we use the notion of quasiclique or γclique, since it allows us to quantify how much we loosen the completeness constraint; meanwhile, at the same time, it ensures the reachability of the group members, a further property of cohesive groups. Formally, given a graph $G=(V,E)$, a γclique is subgraph $G_{S}$ spanned by S, a subset of V, that is connected and γdense. $G_{S}$ is γdense if $\vert E(G_{S}) \vert \geq \gamma\binom{V(G_{S})}{2}$. In this work we use $\gamma= 0.8$ because it is a good tradeoff between imposing too strong constraints on the subgraph density and loosing the idea of cohesive group. Indeed, values below 0.8 lead to a loss of cohesion in case of large groups, whereas values above 0.8 are too restrictive for small groups, because almost all pairs of nodes should be connected. Besides, above the 0.8 threshold, the number of detected quasiclique significantly drops ($73\%$ for $\gamma=0.9$) and causes a loss of generalizability. Following our approach, the identification of cohesive groups turns into the enumeration of all quasicliques of maximum cardinality. To accomplish this task we adopt the Uno’s enumeration algorithm [34] which returns all the locally maximal quasiclique in a given graph. Then, for each quasiclique we verify whether it is connected or not, discarding the unconnected ones. This way we identify all the connected locally maximal quasicliques, representing the cohesive groups whose members would be verified to be colocated in the last stage.
Colocation filtering
The high spatial granularity of the data enables us to localize users with a precision of the city block when an onphone activity is performed. We exploit the location information to detect the colocation of the quasiclique members. We extract from the CDR 6tuple the sequence of the recorded locations of each user, along with the temporal annotation. Thus we obtain an array $T_{\mathrm{MOB},u}$ of 2element sets $(\mathit{loc}, t)$ called the mobility trace of the user u.
The mobility traces of all the users are the starting point of the colocation algorithm. As we are interested in detecting the colocation of the members of the quasicliques, the colocation algorithm runs on each quasiclique separately. Specifically, a quasiclique experiences a colocation event when a fraction η of its members share a location for a time period. In this work we use $\eta= 0.6$. The output of the algorithm is the list of colocation events, where each colocation event is identified by the triplet $\langle(t_{s},t_{e}), \mathit{loc}, M_{e}\rangle$, where $t_{s}$ and $t_{e}$ are respectively the starting and ending times of the colocation time interval, loc is the location, and $M_{e} \subseteq M $ is the set of quasiclique members participating in the colocation event. So, the colocation algorithm checks if a cohesive group is an urban group and identifies when and where an urban group gets together. For more details about the colocation filtering algorithms see the Appendix.
Urban group behaviors
In this section we analyze the structural and spatiotemporal characteristics of urban groups, showing that urban groups represent a significant portion of all existing cohesive groups in the interaction graph. From a social viewpoint, urban groups are statistically similar to other groups found in different sociotechnological social networks. The number of members in each urban group, i.e. the size, is quite small, very similar to the size of WhatsApp groups [35], and favors the formation of strong relationships. Moreover, the level of overlapping among different groups is lined up with other social networks, expressing the attitude of groups to connect around a particular interest. From a spatiotemporal viewpoint, urban groups also present interesting characteristics. They usually prefer to meet in very few locations and often experience colocation events, i.e. they get together on average every three days.
Size and membership
A preliminary albeit fundamental aspect of our investigation on urban groups is to measure their relevance within different types of social aggregations, i.e. the number of urban groups related to the overall number of cohesive groups. To this aim, we compare the number of cohesive groups before and after the colocation filtering. We find that most of the cohesive groups we can capture through onphone interactions are urban groups. In particular, we identify more than $28{,}000$ urban groups. They represent 75% of the quasicliques with size greater than 4 in the interaction graph, and involve about 23,800 of the operator’s subscribers. To assess whether the emergence of the urban groups is not only due to the wellknown correlation between the onphone interactions and colocation which characterizes the reciprocal calls between pairs of users [15, 36], we test if the measured number of groups is significantly higher than the one obtained by a null model, in which a dependency between communications and colocation exists. Specifically, the null model is based on the colocation graph studied in our previous work [31] and on the observation that, given a link between two customers in the colocation graph, the probability that they communicate by call or text is 0.06. So, for each link in the colocation graph we draw the corresponding link in the interaction one with probability 0.06, then we extract the quasicliques. We repeat the model generation 100 times and we measure the significance. We obtain a pvalue much lower than 0.001, showing that the aforementioned correlation at a link level alone does not explain the emergence of the measured number of cohesive groups. These findings suggest that (i) the correlation between physical proximity and onphone interactions, which holds for pairs of users [15, 36], can be extended to groups; and (ii) onphone social networks are much more accurate than their online counterpart in mirroring people’s offline sociality. Meanwhile, they share with them the power to generate high volumes of data traffic.
With the ever growing relevance of social networking sites, the size of a group of persons represents one of the main aspects of a social environment, since it influences the strength of relationships, the intensity of participation in group activities and the consonance of aims [37]. In Fig. 2a we show the probability distribution function of the urban group size. It highlights that small groups ($k=5,6$) are predominant in mobile phone networks. Moreover, the short tail of the distribution—its maximum value is 13—indicates a substantial difference w.r.t. community detection approaches. In fact, community detection algorithms identify hundred/thousandpeople communities, whereas these groups vanish when we search for highly dense regions in the mobile phone graph. The result supports the findings in [33] showing that community detection algorithms may return loosely connected subgraphs that we can vaguely assimilate to tightknit groups or communities. Surprisingly, by comparing the group size with the group size measured on WhatsApp [35], we observe that their sizes are very similar.
The formation of an urban group depends on the time needed by the subgraph to reach the minimum required density γ. To test how stable the definition of urban groups at different time periods of different length is, for each group, we measure the number of weeks needed so that the subgraph reaches the required density threshold. In Fig. 3 we report the distribution of the probability that an urban group is detected within a certain number of weeks. As we can see, most of the urban groups (around 90%) are detected within 7 weeks. Moreover, we find that this result does not depend on the size of the group, as we can observe in the inset of Fig. 3, where the distribution grouped by group size is shown.
Groups could form around common interests or existing social structures, such as family, workmates, teammates, so an individual may likely participate in different social groups [37, 38]. To verify whether this phenomenon holds also for urban groups, we investigate if the operator’s customers belong to a single group or if they participate in different groups, each corresponding to different interests [39]. To this aim, in Fig. 2b, we report the distribution of the number of urban groups a user belongs to. The distribution follows a heavytail trait, i.e. most of users belong to few cohesive groups, but people participating in many urban groups do exist. In particular, half of the population share at most 2 urban groups, while the average number of groups per user is 6. Similar results have been observed in other social networks, such as Flickr [38] and LiveJournal [40].
Locations and visit patterns
Given the strict interplay among groups, interests and places, urban groups are supposed to meet in specific locations, somehow related to the group activity. We identify a group gathering by detecting when its members are colocated in a cell tower. However, cells have different coverage radius according to the distance from the city center (see Fig. 1) and this could affect the characteristics of the urban groups. In particular, the larger the coverage radius the higher the probability of colocation events among group members and this could lead to an overestimation of the size of the urban groups. To investigate how the length of cell radius affects the characteristics of the urban groups, we only consider urban groups that gettogether in the cells that belong to the innermost ring and we repeat the analysis we conducted in the previous section. We perform the Kolmogorov–Smirnov test and we obtain the following results: 0.053 (pvalue < 0.001) for the distribution of group size and 0.033 (pvalue < 0.001) for the distribution of number of groups for each subscriber. Based on these results showing no significant statistical difference between the distributions, we do not make any restriction on where a colocation event takes place.
To investigate the connection between locations and urban groups, we measure the number of locations where each group gets together. In the following, we will use the notion of location instead of cell to overcome the artifacts introduced by the network load balancing algorithm, which associates mobile users to different cells, according to the current network status, even if the users’ position does not change. We exploit a coarse subdivision of the metropolitan area directly provided by the network operator and aggregate neighbor cells in groups of size from 8 to 15. In Fig. 2c we report the histogram of the number of locations where each group colocates. As we can observe, most of the groups colocate in very few places; mean, median and standard deviations are 3.16, 2 and 2.54, respectively. This result shows that urban groups are characterized by few visited locations, in strict analogy with individuals’ mobility [18, 41].
As urban groups are characterized by a tightknit network of communications and a limited set of preferred locations, a question arises about whether or not groups need to combine frequent encounters with onphone interactions to express the group sociality. To this aim, we analyze the continuity of the encounters of each urban group by computing the number of days each urban group colocates. In Fig. 2d we show the histogram of the number of days each group is colocated (we consider a group colocated in a day if at least one colocation event exists on that day). The mean, median and standard deviation of the number of days distribution are 18.20, 14.0 and 15.33, respectively, with more than 70% of groups meeting on more than 5 days. This result shows that the encounters among the members of a group are not sporadic and indicate some regularity. We can argue that, on average, urban groups need to combine onphone interactions and gettogethers in a few urban places to fully express and support their activities.
Interactivity of the urban groups
Along with mutuality and reachability properties, interactivity—i.e. the frequency of interactions among members—defines a cohesive group. For a group to be cohesive, it is in fact required that the group members maintain frequent interactions with one another. By leveraging the number of interactions between the pairs forming a group, we can evaluate if the interactivity property holds for urban groups and, consequently, measure the effort members devote to maintaining their relationships inside a group. In line with previous works on subgraphs in call graphs [12], we adopt the intensity int of an urban group to assess the effort of maintaining the relationships within an urban group. The intensity of an urban group $\mathit {ug}_{i}$ is defined as the geometric mean of its link weights:
where $E(\mathit{ug}_{i})$ denotes the links forming the urban group $\mathit{ug}_{i}$. Here the effort, i.e. the link weight, coincides with the number of interactions.
In Fig. 4a, we report the distributions of the urban group intensity grouped by the size of the subgraphs. We observe that for $k=5,\ldots,9$ the distributions are very similar, while for bigger groups the distributions shift towards higher intensity values. We make this trait more explicit in the inset figure, where we show the boxplot of the intensity as a function of the group size. Each bar spans the likely range of variation (from first to third quantile), the segment inside the rectangle indicates the median of the distribution and the points below and above the whiskers represent outliers. The figure highlights two important points: first, regardless of the size of the urban group, more than 75% of groups reach an intensity higher than or equal to 20. So, the members within these groups interact more than 20 times with each of the other members. Secondly, we distinguish two typical behaviors involving groups with $k=5,\ldots,9$ and larger groups ($k=10,\ldots,13$). Specifically, smaller groups are mainly characterized by an average intensity within 20 and 60, while in bigger groups the intensity intervals range from 50 to 130. This shows that members of large urban groups devote many efforts in interacting with one another and to maintaining relationships established within the group.
We have just shown that a high level of interactivity characterizes urban groups. Now we ask whether the physical proximity of the members of the urban group impacts the interactions occurring within the group. That is, we wonder if the colocation property stimulates onphone interactions within cohesive groups. To this aim, we compare the distributions of the intensity in urban groups and notcolocated cohesive groups. In Fig. 4b we report the comparison by the qqplot for different sizes. By the qqplot we are able to verify whether or not two distributions are equal by computing and displaying their quantiles. The 45° line in the figure represents the identity case (black dotted line), while in case of diverse distributions the plot lies below or above the line. In the figure we observe that for smaller groups ($k=5,\ldots,8$), the distributions of the intensity are similar only for the first 0.1quantiles, while urban groups show higher values of intensity for the highest quantile, i.e. with $k=5,\ldots,8$ urban groups are more interactive than notcolocated cohesive groups of the same size. For bigger groups, this phenomenon is even more accentuated, since notcolocated groups take higher values for the lowest quantiles than urban groups; by contrast, in urban groups the highest quantiles are much higher than in notcolocated groups. In general, we find that urban groups are much more interactive than their notcolocated counterparts. So, the opportunity of meeting in urban spaces strengthens the relationships expressed by onphone communications; meanwhile tightknit groups, whose interactions are strong and frequent, likely colocate in a few specific locations.
Preferences of urban groups
People’s aptitude to prefer specific elements is an acrosstheboard aspect affecting diverse human activities, from online social engagement, where users frequently interact with a strict subset of their online friends [42, 43], to offline activities, where a limit on the number of people with whom an individual establishes stable social relationships has been shown [44,45,46]. This aptitude also holds for human mobility, whose footprint can be described by very few most visited locations [18]. Here, we investigate if such an aptitude characterizes the behavior of the urban group members; specifically, we ask whether or not a backbone of strongest ties exists within urban groups, if groups have preferred meeting places and whether or not different roles emerge.
Favorite interactions within urban groups
The previous results about urban group intensity have shown the average effort to maintain relationships within colocated cohesive groups. However, this behavior could be the effect of relationships much more active than others, i.e. heterogeneity, or a homogeneous interactivity involving all the ties forming a group. To assess the homogeneity among interactions in a group, we measure the coherence of an urban group $\mathit{ug}_{i}$. Given an urban group $\mathit{ug}_{i}$, its coherence $\operatorname{coh}(\mathit{ug}_{i})$ [47] is defined as:
By AMGM inequality,^{Footnote 5} the coherence takes values between 0 and 1. The more homogeneous the ties within an urban group, the closer to 1 the coherence. Figure 4c shows the distributions of the coherence for $k=5,\ldots,13$ and the inset figure reports the box and whiskers plots of the coherence as a function of the group size. The figure indicates that the distributions for $k=5,\ldots,10$ are very similar, while larger groups are characterized by coherence values closer to 0. Regardless of the high variability of smaller groups, the inset figure indicates the same trait. Specifically, urban groups with size $k\in[5,10]$ have a median value close to 0.6, while larger groups ($k\in[11,13]$) result in a median close to 0.4. These results indicate that the interactivity of the links within urban groups is more uniformly spread in smaller groups than in larger ones. In fact, in these cohesive groups the relationships between some pairs of members are stronger than others. So, in larger groups the social effort, i.e. the number of interactions, is more focused on some specific relationships, while in smaller groups the effort is more evenly balanced among all ties.
Favorite locations
We have shown that both individuals and groups share the attitude to visit a small set of locations. We know from the literature that individuals are actually very regular in this and have a few favorite locations [48]. Do urban groups behave similarly? We approach the analysis in two ways. First, we compute the Gini index of the number of days each group meets in a particular location. Secondly, for each urban group we extract the number of locations where a group meets at least a given percentage of days. To avoid the bias introduced by groups meeting too infrequently, we restrict the analysis to those urban groups having a number of distinct days greater than 2. In Fig. 5 we report the Gini index distribution grouped by the number of locations visited by urban groups. As we can observe, the values of Gini index are far from 0 (equality condition); mean values range from 0.35 to 0.50, if we consider a number of different locations somewhere between 3 and 22. These results highlight that almost all groups distribute their gettogethers unevenly among the set of locations. Table 2 reports the descriptive statistics of the number of locations which satisfy the condition using different percentage values. From the results it emerges that most of groups have at most one location, and the median value is 1 except for the highest values of the percentage of days. It is worth noting that this characteristic holds both for groups visiting just a few locations and those visiting many locations. So, urban groups have the tendency to meet in very few favorite locations, disregarding the total number of locations visited by their members. This aspect holds both for individuals and groups.
Role discovery: leaders and followers
Today’s massive diffusion of instant messaging services is rooted in the advent of onphone communications that, ever since their introduction, have made interactions within a group of persons easier. This is confirmed by the previous results showing that the onphone interactions of urban groups are intense and frequent. By contrast, facetoface interactions require considerable effort to synchronize all members of a group. Thus, it is quite uncommon for individual members of an urban group to participate in all the gettogethers of that group. That is, we wonder if the mutuality and interactivity properties also characterize facetoface interactions, or, by contrast, if a bias exists among the members.
The relaxation of the urban group’s members presence, governed by the η parameter in the colocation filter, allows us to capture, for each colocation event, the presence of a subset of the group’s members. Given this information we are able to measure the degree of participation of each member in the group gatherings. To this aim, we compute the presence probability of each member as the ratio between the number of days the member participates in urban group gatherings and the total number of days in which the group got together. Figure 6 shows the boxplot of the presence probability of each member for all urban groups of size 5. Similar results are observed for the other group sizes. The value of rank indicates the importance of the member in terms of days of presence. Thus, rank equal to 1 refers to the most present member while value equal to 5 refers to the least present member. From the figure we observe that the distribution of the presence probability of all the members having the highest rank is concentrated very close to 1 (mean and standard deviation are 0.98 and 0.05, respectively), meaning that these users participate in almost all gatherings of the group they belong to. By contrast, for rank 5 we observe lower and broader values (mean and standard deviation are 0.39 and 0.29, respectively). Given these results, let us divide urban group members into two main groups: leaders and followers. A leader is a member who frequently participates in urban group gatherings, whereas a follower is a member who sometimes or rarely joins group gettogethers. While it is easy to identify the leader, or leaders in some cases, it is harder to categorize a member as a follower, as we can observe from Fig. 6. In fact, the distributions of the presence probability of members with ranks 4 and 5 are spread. Thus, there are groups where the distinction between leader and follower is more pronounced, and others where it is indefinite. Clearly, this aspect reflects the variety of behaviors within each single urban group. The results show the existence of a bias among the members taking part in urban group gatherings: there is a subset of members (the leaders) who take part frequently in the gettogethers, while another subset of members (the followers) who are less involved in the group’s facetoface interactions.
The urban groups from the city viewpoint
When we try to map the activities of these cohesive urban groups in an urban space, or a city, we comprehend how their behavior and dynamics greatly influence the design, planning and dimensioning of both online and offline services. For instance, they shape the traffic flows of mobile networks, affect the planning of urban services, inspire the rise of new locationbased services, and direct advances in content management and mobile edge computing. In this section we analyze the colocation events through the lens of the city so as to investigate how urban group gatherings are distributed in the city space. In particular, we are interested in finding popular locations and differences between day and nighttime [49]. To perform the analysis we consider all colocation events that occurred during the entire dataset time interval and we divide the events that took place during the day from those that took place at night. We consider a colocation event belonging to the day if it occurred between 8:00 a.m. and 7:00 p.m.; otherwise we consider the event as a nighttime one.
In the Fig. 7 we report the number of distinct urban groups that visit^{Footnote 6} each city location (sorted from the most to the least popular) by distinguishing between day and nighttime hours. As we can observe from the figure, the popularity of the locations is not uniformly distributed. In fact, a small set of locations have high popularity; only 9 and 6 locations have a number of urban groups higher than 1000 if we consider the day and night hours, respectively.
In Fig. 8a and 8b we report the heatmap of the location popularity during day and night hours, respectively. We can identify 7 city zones denoted by capital letters from A to G. In the discussion about the difference between day and night, we have to consider that Milan has no a clear division in functional areas, such as educational, business and shopping districts. This characteristic clearly emerges from the two heatmaps where we can observe that most of the metropolitan areas are popular during both day and night. In particular, the region A is a business district that is also full of pubs and concert clubs, B is a residential area with small markets and shops, C holds the football stadium and concert arena, D is a place full of restaurants and pubs, E is a shopping, entertainment and nightlife district, F is the downtown area, and G is one of the Milan’s most famous night life districts.
To deepen the analysis of the differences between day and night behavior we compute the variation of popularity rank. Figure 9 shows the rank difference for each location. Then in Fig. 8c we report the heatmap of the rank differences across the city map (red means that the location loses popularity between day and night, while blue means that the location gains popularity; the color intensity reflects the absolute value of the rank difference of the location). From Fig. 9, we can observe that most locations have a small variation, only 18% of locations have an absolute variation higher than 50, while the percentage decreases to 4% if we consider an absolute variation higher than 100. It is interesting to note that almost all the metropolitan districts considered exhibit a very small variation between day and night. The only exception is zone C, where the football stadium and concert arena are found—both of which are used mainly during the night hours. This result is due to the multifunctionalities of those areas, combined with the “happy hour”effect. Another interesting aspect that emerges from Fig. 8c is that some of the highest variations are in proximity of areas D, F and G. We can observe two opposite variations: the locations close to D and G zones gain in popularity during the night hours, while the locations surrounding F lose rank positions. These findings reflect the Milan nightlife, which moves outward the downtown (area F) to districts A, C, D and G.
Conclusions
Mobile social networks are evolving gradually toward serving the needs of small groups of friends who are very close to one another and/or share common interests. This new type of online social services is shifting away from the large communities of friends of former social networks; it is more oriented toward lighthearted amusement, intimacy and intense sharing of specific contents, and less to information and self promotion. A few emerging social networks, such as Snapchat or WeChat, the impressive rise of groups in WhatsApp, as well as the rise of interestdriven social networks, such as Strava, all prove the point: people like to share images and videos with a restricted group of close friends, a social circle where they feel comfortable talking about themselves, even acting goofy, and not having to suffer the strain of performing in public or thinking hard before publishing a post [2]. The trend echoes Dunbar’s social grooming [50] and leads us to envision groups consisting of few persons with strong social ties who interact frequently to informally share information about their daily life and they do that mainly by exchanging geolocalized information and camerabased messages^{Footnote 7} (videos or images).
When we try to map the activities of these groups in an urban space, or a city, we comprehend how their behavior and dynamics greatly influence the design, planning and dimensioning of both online and offline services. For instance, group mobility affects the planning of urban services and inspires the rise of new locationbased services, while group interactions shape the traffic flows of mobile networks, and direct advances in content delivery and mobile edge computing.
This paper unveils the real nature of mobile and cohesive social groups, named urban groups, providing a thorough analysis and evidence of their behavior and dynamics, and showing that this achievement can be obtained by mining an anonymized mobile phone dataset based on Call Detail Records (CDRs). The analysis puts in the spotlight some interesting urban group behaviors. For instance: (i) urban groups are chiefly small social groups, whose members are very interactive; (ii) the group members move and keep interacting on the move; (iii) they have periodic gatherings and meet up in favorite city places, revealing that they are rooted in the territory; and (iv) it is easy to identify a group leader and the followers.
Notes
 1.
For the purpose of ensuring customer anonymity, each subscriber is identified by a surrogate key.
 2.
Website http://unwiredlabs.com/.
 3.
The choice of the six closest cells is due to the conventional representation of cells as hexagons.
 4.
Whether or not an anonymized number belong to an operator’s customer has been provided by the mobile operator.
 5.
Inequality of arithmetic and geometric means states that the arithmetic mean of a set of nonnegative numbers is greater than or equal to its geometric mean.
 6.
We use the term visit to indicate that at least one colocation event occurred in a given location.
 7.
SnapChat is a selfdeclared cameracompany.
Abbreviations
 CDR:

Call Detail Record
 SMS:

Short Text Message
 CDF:

Cumulative Distribution Function
 AM:

Arithmetic mean
 GM:

Geometric Mean
References
 1.
Conti M, Das SK, Bisdikian C, Kumar M, Ni LM, Passarella A, Roussos G, Tröster G, Tsudik G, Zambonelli F (2012) Looking ahead in pervasive computing: challenges and opportunities in the era of cyberphysical convergence. Pervasive Mob Comput 8(1):2–21
 2.
Utz S, Muscanell N, Khalid C (2015) Snapchat elicits more jealousy than Facebook: a comparison of Snapchat and Facebook use. Cyberpsychol Behav Soc Netw 18(3):141–146
 3.
Kostakos V, O’Neill E, Penn A, Roussos G, Papadongonas D (2010) Brief encounters: sensing, modeling and visualizing urban mobility and copresence networks. ACM Trans ComputHum Interact 17(1):2
 4.
Wang Z, Zhang D, Zhou X, Yang D, Yu Z, Yu Z (2014) Discovering and profiling overlapping communities in locationbased social networks. IEEE Trans Syst Man Cybern Syst 44(4):499–509
 5.
Grauwin S, Szell M, Sobolevsky S, Hövel P, Simini F, Vanhoof M, Smoreda Z, Barabási AL, Ratti C (2017) Identifying and modeling the structural discontinuities of human interactions. Sci Rep 7:46677
 6.
Allen SM, Chorley MJ, Colombo GB, Jaho E, Karaliopoulos M, Stavrakakis I, Whitaker RM (2014) Exploiting user interest similarity and social links for microblog forwarding in mobile opportunistic networks. Pervasive Mob Comput 11:106–131
 7.
Min JK, Cho SB (2011) Mobile human network management and recommendation by probabilistic social mining. IEEE Trans Syst Man Cybern, Part B, Cybern 41(3):761–771
 8.
Wang R, Gou Q, Choi TM, Liang L (2018) Advertising strategies for mobile platforms with ‘apps’. IEEE Trans Syst Man Cybern Syst 48:767–778
 9.
Yang D, Zhang D, Zheng VW, Yu Z (2015) Modeling user activity preference by leveraging user spatial temporal characteristics in LBSNs. IEEE Trans Syst Man Cybern Syst 45(1):129–142
 10.
Atzmueller M (2014) Social behavior in mobile social networks: characterizing links, roles, and communities. In: Mobile social networking: an innovative approach. Springer, Berlin, pp 65–78
 11.
Quadri C, Zignani M, Capra L, Gaito S, Rossi GP (2014) Multidimensional human dynamics in mobile phone communications. PLoS ONE 9(7):103183
 12.
Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D, Kaski K, Kertész J, Barabási AL (2007) Structure and tie strengths in mobile communication networks. Proc Natl Acad Sci 104(18):7332–7336
 13.
Naboulsi D, Fiore M, Ribot S, Stanica R (2015) Largescale mobile traffic analysis: a survey. IEEE Commun Surv Tutor 18(1):124–161
 14.
Calabrese F, Smoreda Z, Blondel VD, Ratti C (2011) Interplay between telecommunications and facetoface interactions: a study using mobile phone data. PLoS ONE 6(7):20814
 15.
Wang D, Pedreschi D, Song C, Giannotti F, Barabasi AL (2011) Human mobility, social ties, and link prediction. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. KDD’11. ACM, New York, pp 1100–1108
 16.
Phithakkitnukoon S, Smoreda Z, Olivier P (2012) Sociogeography of human mobility: a study using longitudinal mobile phone data. PLoS ONE 7(6):39253
 17.
Gonzalez MC, Hidalgo CA, Barabasi AL (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782
 18.
Papandrea M, Jahromi KK, Zignani M, Gaito S, Giordano S, Rossi GP (2016) On the properties of human mobility. Comput Commun 87:19–36
 19.
Nika A, Ismail A, Zhao BY, Gaito S, Rossi GP, Zheng H (2016) Understanding and predicting data hotspots in cellular networks. Mob Netw Appl 21:402–413
 20.
Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 8. Cambridge University Press, Cambridge
 21.
Abello J, Resende MGC, Sudarsky S (2002) Massive quasiclique detection. In: Rajsbaum S (ed) LATIN 2002: theoretical informatics: 5th Latin American symposium, 2002 proceedings. Springer, Berlin, pp 598–612
 22.
Sekara V, Stopczynski A, Lehmann S (2016) Fundamental structures of dynamic social networks. Proc Natl Acad Sci 113(36):9977–9982
 23.
Nanavati AA, Singh R, Chakraborty D, Dasgupta K, Mukherjea S, Das G, Gurumurthy S, Joshi A (2008) Analyzing the structure and evolution of massive telecom graphs. IEEE Trans Knowl Data Eng 20(5):703–718
 24.
Blondel VD, Decuyper A, Krings G (2015) A survey of results on mobile phone datasets analysis. EPJ Data Sci 4(1):1
 25.
Lambiotte R, Blondel VD, De Kerchove C, Huens E, Prieur C, Smoreda Z, Van Dooren P (2008) Geographical dispersal of mobile communication networks. Phys A, Stat Mech Appl 387(21):5317–5325
 26.
Karsai M, Kaski K, Barabási AL, Kertész J (2012) Universal features of correlated bursty behaviour. Sci Rep 2:397
 27.
Li MX, Palchykov V, Jiang ZQ, Kaski K, Kertész J, Miccichè S, Tumminello M, Zhou WX, Mantegna RN (2014) Statistically validated mobile communication networks: the evolution of motifs in European and Chinese data. New J Phys 16(8):083038
 28.
Zignani M, Quadri C, Bernadinello S, Gaito S, Rossi GP (2014) Calling and texting: social interactions in a multidimensional telecom graph. In: Proceedings of the complex networks 2014 workshop on complex networks and their applications. Complex networks ’14. IEEE, pp 408–415
 29.
Xu K, Zhang X (2012) Mining community in mobile social network. Proc Eng (2012 International workshop on information and electronics engineering) 29:3080–3084
 30.
Li MX, Xie WJ, Jiang ZQ, Zhou WX (2015) Communication cliques in mobile phone calling networks. J Stat Mech Theory Exp 2015(11):11007
 31.
Zignani M, Quadri C, Gaito S, Rossi GP (2015) Calling, texting, and moving: multidimensional interactions of mobile phone users. Comput Soc Netw 2(1):13
 32.
Leung IXY, Hui P, Liò P, Crowcroft J (2009) Towards realtime community detection in large networks. Phys Rev E 79:066107
 33.
Tibély G, Kovanen L, Karsai M, Kaski K, Kertész J, Saramäki J (2011) Communities and beyond: mesoscopic analysis of a large social network with complementary methods. Phys Rev E 83(5):056125
 34.
Uno T (2010) An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica 56(1):3–16
 35.
Seufert M, Hoßfeld T, Schwind A, Burger V, TranGia P (2016) Groupbased communication in WhatsApp. In: 2016 IFIP networking conference (IFIP networking) and workshops, pp 536–541
 36.
Calabrese F, Smoreda Z, Blondel VD, Ratti C (2011) Interplay between telecommunications and facetoface interactions: a study using mobile phone data. PLoS ONE 6(7):20814
 37.
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. KDD’06. ACM, New York
 38.
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. IMC’07. ACM, New York
 39.
Mcauley J, Leskovec J (2014) Discovering social circles in ego networks. ACM Trans Knowl Discov Data 8(1):4
 40.
Yang J, Leskovec J (2012) Defining and evaluating network communities based on groundtruth. In: 2012 IEEE 12th international conference on data mining (ICDM). IEEE, pp 745–754
 41.
Song C, Qu Z, Blumm N, Barabási A (2010) Limits of predictability in human mobility. Science 327(5968):1018–1021
 42.
Zignani M, Gaito S, Rossi GP (2016) Predicting the link strength of newborn links. In: Proceedings of the 25th international conference companion on World Wide Web, International World Wide Web Conferences Steering Committee, pp 147–148
 43.
Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in Facebook. In: Proceedings of the 2nd ACM workshop on online social networks. WOSN’09. ACM, New York
 44.
Dunbar R, Arnaboldi V, Conti M, Passarella A (2015) The structure of online social networks mirrors those in the offline world. Soc Netw 43:39–47
 45.
Miritello G, Moro E, Lara R, MartínezLópez R, Belchamber J, Roberts SGB, Dunbar RIM (2013) Time as a limited resource: communication strategy in mobile phone networks. Soc Netw 35(1):89–95
 46.
Gaito S, Manta G, Quadri C, Rossi GP, Zignani M (2014) Groome: handling the dynamics of our sociality on mobile phone. In: Wireless and mobile networking conference (WMNC), 2014 7th IFIP. IEEE, pp 1–4
 47.
Onnela JP, Saramäki J, Hyvönen J, Szabó G, de Menezes MA, Kaski K, Barabási AL, Kertész J (2007) Analysis of a largescale weighted network of onetoone human communication. New J Phys 9(6):179
 48.
Pappalardo L, Simini F, Rinzivillo S, Pedreschi D, Giannotti F, Barabási AL (2015) Returners and explorers dichotomy in human mobility. Nat Commun 6:8166
 49.
De Nadai M, Staiano J, Larcher R, Sebe N, Quercia D, Lepri B (2016) The death and life of great Italian cities: a mobile phone data perspective. In: Proceedings of the 25th international conference on World Wide Web. WWW’16. International World Wide Web Conferences Steering Committee, Switzerland, pp 413–423.
 50.
Dunbar RI, Spoors M (1995) Social networks, support cliques, and kinship. Hum Nat 6(3):273–290
Acknowledgements
Not applicable.
Availability of data and materials
The data that support the findings of this study are available from the cellular network operator but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the cellular network operator.
Funding
This research has been funded by the Project Phydia (2016–2018) through the transition grant of the University of Milan.
Author information
Affiliations
Contributions
MZ, CQ, SG and GPR conceived and designed the experiments. MZ and CQ performed the experiments. CQ, MZ, SG and GPR analyzed the data. CQ, MZ, SG and GPR wrote the paper. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Matteo Zignani.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Colocation filtering algorithm
Appendix: Colocation filtering algorithm
The algorithm takes as input the set of members (M) of the quasiclique, the list ($T_{M}$) of the mobility traces of all its members and three parameters: the minimum percentage of members required for each colocation event (η), the time threshold (Δ) and the temporal granularity (τ), whose meaning will be explained later. The output of the algorithm is the list of colocation events, where each colocation event is identified by the triplet $\langle(t _{s},t_{e}), \mathit{loc}, M_{e}\rangle$, where $t_{s}$ and $t_{e}$ are the starting and ending time, respectively, of the colocation time interval, loc is the location, and $M_{e} \subseteq M $ is the set of pclique members participating in the colocation event.
The pseudo code of the colocation filtering algorithm is depicted in Algorithm 1. The first step (line 2) initializes the set of potential locations, where colocation events could happen, as the union of the locations of each member. Here we need the union operator instead of the intersection because we do not impose that all members have to participate in a colocation event. Then the algorithm iterates over all the potential locations and performs two tasks: (i) temporal filling of the mobility traces of all quasiclique members (lines 4–9) and; (ii) detection of the colocation (lines 10–13).
A.1 Temporal filling of traces
For each location the preprocessing of the mobility traces performs a transformation to ease the colocation events detection. It is composed by four sequential steps operating performed for each member. First, it sorts the original trace according to the timestamp in ascending order. Second, the procedure TimestampToInterval transforms each point of the trace, identified by the timestamp t of the CDR record, in a new point $\langle t_{s}, t_{e}\rangle$, where $t_{s} = t  \Delta$ and $t_{e} = t +\Delta$, representing the extremes of time interval the user is supposed to be. Here we assume that if the user was in a location at time t she/he remained in that location from $t\Delta $ until $t+\Delta$, in line with [15] we use $\Delta= 30$ minutes. Third, the procedure MergeOverlapping takes all the time intervals and merges the ones that overlap. Given the above sorting, two consecutive intervals $i_{1}$, $i_{2}$ overlap if $t_{s}^{2} \leq t_{e}^{1}$. In the last step, the procedure FillInterval converts each time interval in $I_{m}^{l}$ into an array of temporal ticks according to the temporal granularity parameter τ ($\tau= 1$ minute). For instance, the produced array is $\langle t_{s}, t_{s}+\tau, t_{s}+ 2\tau,\ldots, t_{e}  \tau, t_{e} \rangle$. By construction, the temporal ticks produced are unique over the dataset time frame, simplifying the colocation detection. The output of the task is a list of temporal ticks for each member of the quasiclique. When the task ends we get for each member the set of intervals during which the user was in a specific location.
A.2 Colocation detection
Finally, the colocation detection task exploits the temporal tick representation of the mobility traces and performs a simple counting. In detail, in line 10, the concatenation of all the lists of temporal ticks is performed. This results in $F_{l}$. Then the procedure OccurrenceCount takes $F_{l} $ and counts the occurrences of each single temporal tick; as output, it produces the list of occurrences identified by the tuple $\langle\textit{tick}, \{\textit{colocated members}\}\rangle$. In the next step, the procedure FilterMembershipCardinality filters out the occurrences in which the number of colocated members is below the threshold η. In the last step, the procedure OccurrenceToInterval transforms the list of occurrences in a list of intervals by aggregating adjacent temporal ticks. When this second task terminates, we obtain the set of temporal intervals.
A.3 Colocation filtering algorithm complexity
Now we briefly discuss the time complexity of the colocation algorithm using the following notation: n as the number of records of the mobility trace, m as the number of the quasiclique’s members, and l as the number of potential locations.
The temporal filling task is performed m times, one for each member, and its time complexity is dominated by the TimestampSort procedure which is $O(n\log{n})$, obtained by using a classical sorting algorithm. The procedures TimestampToInterval and FillInterval are linear in the number of records because they perform a transformation of all elements by taking $O(1)$ time for each element. The procedure MergeOverlapping is also linear w.r.t. n because it exploits the ordering and the equal length of the time intervals. Thus, the checking of the overlapping condition is limited to two consecutive intervals only. The resulting time spent by the algorithm to perform the filling task over all members is $O(m\cdot n \log{n})$.
The colocation detection task is performed once per each location and is linear in the number of records, $O(n)$, because the procedures OccurrenceCount and FilterMembershipCardinality iterate over all records by performing constant time operations; and the procedure OccurrenceToInterval can be optimized in order to perform a linear scanning over all occurrences by checking the adjacent condition between two consecutive temporal ticks only.
The two previously discussed tasks are performed l times, one per each location. Thus, the overall time complexity of the colocation algorithm is $O(l\cdot(m\cdot n\log n + n))$. It is worth of noting that in a real application scenario the number of users belonging to a quasiclique is very small and, due to the high regularity of the users’ mobility, the set of locations visited by a single user is small [18]. Consequently, we have $m\ll n$ and $l\ll n $ and we can rewrite the time complexity as $O(n\log n + n)$.
The proposed algorithm is highly parallelizable. At the highest level, each quasiclique can be analyzed separately. As further optimization each location can be processed in parallel. Moreover, the temporal filling task can also be parallelized.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Mobile phone graph
 Mobile social groups
 Quasiclique
 Group points of interest
 City’s points of interest