Structural and temporal inhomogeneities
We begin our investigations by addressing the inhomogeneities that are expected to play an important role in the evolution of the structural properties of networks aggregated over growing time intervals. The fundamental structural inhomogeneities are reflected in the standard statistical distributions for the call network, aggregated over the entire 6-month period of observation. In the aggregated network , a link is established between nodes i and j if a call is observed between them at any point during the aggregation interval ; the weight of the link is defined as the total number of calls between i and j within the interval. The strength of node i is then defined [3] as the total number of calls where i participates, and the degree as usual as the number of links that node i has. As expected on the basis of earlier results [9–11], the probability density distributions of degree, strength, and link weights are all broad (see Figure 1). Thus there is a large number of nodes that make only infrequent calls, and a large number of links that carry only a few calls. When aggregating the network over shorter time intervals, one thus expects to first discover the high-strength nodes and high-weight links that are associated with the tails of these distributions.
In the time domain, the two main inhomogeneities are related to burstiness of calls forming the links, and the overall circadian pattern of the system-wide call frequency. Burstiness of the calls is reflected in the probability distribution of the times τ between consecutive calls on individual edges. In Figure 2(a), it is seen that in line with earlier observations [12, 13, 17], the distribution in our empirical data has a broader-than-Poissonian tail, a signature of burstiness. Such an inter-call time distribution gives rise to longer waiting times than expected if the calls were placed uniformly in time. Because of this, we expect to see slower network growth than for the uniform case. Further, as seen in Figure 2, the network-level call frequency clearly displays the usual daily and weekly pattern [13, 14], where the frequency shows two daily peaks followed by a decrease during nights. In addition, weekend activity is lower, especially for Sundays.
Evolution of network structure
All of the above features are expected to have an effect on the properties of networks aggregated over growing time intervals. Let us first monitor the growth of the aggregated network in terms of the numbers of nodes and links and the average degree, when the network is aggregated up to a time t. As seen in Figure 3(a), the number of observed nodes displays a rapid increase in the beginning of the aggregation process, such that the aggregated network contains 90% of the nodes after days. This rapid increase is followed by slower growth as nodes with low call activity are gradually observed to make calls, joining them to the aggregated network. When compared to the uniform reference, where the time stamps of all calls are drawn uniformly at random from the entire 6-month interval, it is seen that the growth of is slightly slower; however, for longer aggregation times, the difference can be considered negligible and thus the time-domain heterogeneities have a visible effect only for short time windows. For short time windows, in addition to the slowing-down effect of burstiness, the daily pattern is seen to give rise to a stepped shape of the curve (see the inset of Figure 3(a)).
In contrast, the growth in the number of edges is much more gradual, as seen in Figure 3(b). Here, an aggregation time of days is required for catching 90% of the edges of the final 6-month aggregated network. In addition, unlike for the number of nodes, for long aggregation times, the number of edges keeps on growing steadily and no saturation in growth is observed. This is also reflected in the growth of the average degree (Figure 3(c)). Hence, even though the number of nodes becomes fairly stable in an aggregation period of 6 months, one cannot claim to have captured all the edges of the underlying network, and for longer windows, the average degree would still increase. This reflects the joint effect of several factors: first, as the edge weight distribution is broad, there are large numbers of edges with very low call frequencies, and observing those evidently takes a long time; there may be many edges where calls take place less frequently than once in six months. In addition, the ubiquitous burstiness that results in longer waiting times between calls slows down the growth in the number of links especially for the low-weight links - this effect is visible in Figure 3(b), although it is not very strong. Second, for such long observation periods, one can argue that the changes in the network structure should already have a visible effect: new social ties are formed while older ties wane in strength and may even cease to exist. Third, as the data contains all the calls made by the subscribers, many of the calls may be random in the sense that they do not reflect the structure of the underlying social network – as there is no background information on the nature of the calls, a random call to one’s dentist or a call in response to an advertisement on used car sales are counted as links, just as calls to one’s friends or relatives. This third mechanism would naturally result in an ever-growing number of links. The average link weights (Figure 3(d)) must necessarily keep on growing, since all new calls on existing edges are added to their link weight. This growth slows down towards the end of the observation period but does not become as linear-looking as the average degree growth; note that the new links giving rise to growing degrees also affect average weights. Comparison with the uniformly random times reference reveals the effect of burstiness - weights grow faster in the original data because of burstiness, where rapid sequences of calls following one another quickly increase link weights.
As a result of the interplay of the above mechanisms, the network keeps changing while it is being aggregated, and while some of its links are stable in the sense that they remain active for prolonged periods of time, others exist or can be detected only within limited time periods. Then, one may ask what should the aggregation window length be for obtaining representative, “backbone” networks that capture the stablest connections in the system? One way of obtaining a quantitative estimate of the characteristic time scale of network changes is to compare the similarity of networks aggregated for different periods of time when the observation period is divided into multiple consecutive aggregation windows. We calculate the similarity σ of two networks and as
(1)
i.e. the size of the intersection of the sets and divided by the size of their union, such that if the networks are the same, and if they share no links. Figure 4 displays the average similarity σ of networks in consecutive windows of different durations W. When the windows are very short, the networks are very sparse and the number of common links is low. Then, the similarity increases with increasing window duration, reaching a maximum at ∼30 days; subsequently, the similarity begins slowly decreasing as the aggregation process captures more and more of the very weak or random links.
As the growth of the number of links in Figure 3(b) does not saturate, it is of importance to understand the characteristics of links that emerge early on in the process. It is known from previous investigations [9] (with a different set of data) that link weights correlate with the network topology such that high-weight links are associated with dense network neighbourhoods, whereas low-weight links connect such neighbourhoods, in line with the Granovetter hypothesis [18]. This is directly related to the presence of community structure[19] in social networks; links within communities are stronger and have higher-than-average weights [20]. For the network aggregation, this means that clusters and communities containing high-weight links are likely to appear early on in the process. In order to investigate this effect, we measure the evolution of the network-level clustering coefficient , given by 3× the number of triangles divided by the number of connected triplets in the network. As seen in Figure 5(a), the clustering coefficient does indeed show a rapid increase as a function of the aggregation interval length, and then decreases after a peak at around days. This decrease can be attributed to the weak links observed later in the process: those links contribute less frequently to triangles. Hence, if short aggregation periods of around one week are used, the resulting network structure is dominated by strong links associated with dense clusters.
The fact that the edges observed early on in the aggregation process are related to the community structure is also visible when monitoring the overlap[9] of the added links. The overlap of a link connecting nodes i and j is defined as
(2)
where is the number of common neighbours of i and j, and and are their degrees. Thus the overlap measures the fraction of common neighbours out of all neighbours of the two connected nodes. Figure 5(b) displays the average final 6-month overlap of the added links as a function of aggregation time. Here we have calculated the overlap of each link in the final 6-month aggregated network, and averaged over these values for links that are added to the network at time t. It is seen that the links that are added early on in the aggregation process have on average a higher overlap than those added later; the final overlap is a decreasing function. Hence, even when the aggregation times are short, the networks capture features of the community structure of the final aggregated networks. Interestingly, the overlap also shows a strong circadian and weekly pattern - its highest peaks correspond to the early morning when the overall call rate is very low. Thus, if calls are made during these hours, they are likely to be targeted towards people in the strongest clusters of friends and family.
In order to illustrate the network growth, we have visualized small subnetworks corresponding to different aggregation times t (Figure 6). Here, the subnetwork has been obtained by selecting all individuals whose subscriptions are associated with a certain postal code. This method of sampling yields better results than e.g. snowball sampling.a Panels (a) to (d) of Figure 6 show the growth of the network, such that edges that participate in triangles in the final 6-month aggregated network are coloured red. For the shortest aggregation periods (panels (a) and (b)), most of the added edges are in this set, reflecting the above observations on the early appearance of edges connected to communities and clusters. It should be noted that not all community-internal edges are discovered early on; rather, those links that appear early are associated with communities with a high probability.