Skip to main content

Identifying the temporal dynamics of densification and sparsification in human contact networks

Abstract

Temporal social networks of human interactions are preponderant in understanding the fundamental patterns of human behavior. In these networks, interactions occur locally between individuals (i.e., nodes) who connect with each other at different times, culminating into a complex system-wide web that has a dynamic composition. Dynamic behavior in networks occurs not only locally but also at the global level, as systems expand or shrink due either to: changes in the size of node population or variations in the chance of a connection between two nodes. Here, we propose a numerical maximum-likelihood method to estimate population size and the probability of two nodes connecting at any given point in time. An advantage of the method is that it relies only on aggregate quantities, which are easy to access and free from privacy issues. Our approach enables us to identify the simultaneous (rather than the asynchronous) contribution of each mechanism in the densification and sparsification of human contacts, providing a better understanding of how humans collectively construct and deconstruct social networks.

Introduction

Individuals are interacting in unprecedented ways due to advancements in communication technology, which has granted access to human contact data in a variety of social contexts (e.g., mobile calls [15], texts [6, 7], email [8], face-to-face [913]). Our understanding of fundamental human behavioral patterns have benefited considerably from these rich data sources in which individuals (i.e., nodes) establish and break existing connections (i.e., edges) with each other, thus driving the evolution of a complex network structure. To capture the dynamics of these systems in which the contacts between nodes occur intermittently, social networks are often modeled using a temporal representation [14, 15].

In social systems, contacts tend to occur intermittently not only because individuals have a choice on how and when to engage with others, but also because there are various external factors that affect human activity, such as spatial constraints (e.g., size and allocation of meeting rooms), time constraints (e.g., prespecified schedules in schools, workplaces and conferences), and a circadian rhythm; hence, at a given point in time, the number of active nodes (N) and the number of edges (M) in the system are changing. Furthermore, many empirical networks exhibit a relationship between total edges and network size that is consistent with a densification scaling property [1618]: \(M \propto N^{\gamma}\) with \(\gamma >1\), in which aggregate edges increase superlinearly in network size. In temporal social networks, this dynamical property between N and M is influenced either by (i) fluctuations in population size [17, 19, 20], (ii) changing probability of node connection [19], or (iii) both [20]. Given a fixed connection probability and changing size of population, the conventional superlinear scaling emerges i.e., \(M \propto N^{\gamma}\) with \(\gamma >1\) [19]. Conversely, for constant population size and varying connection probability, M exhibits an accelerating growth pattern where M increases with N much faster than the conventional superlinear scaling would suggest [19].

However, many human contact networks exhibit a dynamical N-M relationship that is a mixture of the two behaviors, each appearing either as a growth in M along a straight line or an increasing M along an upward sloping trajectory on log-log scale [19, 20]. This type of mixed densification scaling usually appears when individuals are free to enter and exit the system, and opportunities to connect are clearly defined (e.g., during lunch in a work setting) or activities are strictly regulated by a schedule (e.g., events at a conference). At a conference, for instance, it is expected that attendees will limit socialization during times designated for a keynote talk because they are attentive to the speaker. During coffee break, in contrast, they are free to interact with others. The emergence of a mixed scaling relationship in temporal social networks suggests that the mechanism that describes the dynamical growth of M in N may be alternating occasionally [20]. From this standpoint, a Markov regime-switching model [21, 22] is employed in a previous study to estimate the probability that the dynamical source of densification and sparsification is attributed either to changing population size or fluctuating intensity in activity level at a given time [20].

Here, we develop an alternative approach to identify the extent to which changing population and connection probability concurrently influence the dynamics of densification and sparsification in human contact networks. The proposed method, based on numerical likelihood functions, enables the simultaneous estimation of population size (=# active nodes +# isolated nodes) and connection probability in different social networks using a series of (N, M) observations, each corresponding to a given temporal snapshot. By taking this approach, we can gain insight not only into the independent contribution of the two mechanisms but also into how their co-movement influences the emergence of a mixed scaling. While contact lists (or event sequences) usually allow us to observe the number of active individuals who made at least one contact, the number of inactive individuals who were present but have never interacted (i.e., isolated nodes) is often unknown. Our approach also provides an estimate for the number of isolated nodes by relying only on the total numbers of active nodes and edges at a given point in time.

Methods

Data

We use the following four temporal human-contact networks collected by the SocioPatterns collaboration [23]:

  • Hospital [24]: Contacts between patients, nurses and doctors at a hospital in Lyon, France on December 7, 2010.

  • Workplace [25]: Contacts between employees at an office building in France on June 27, 2015.

  • IC2S2-17 [26]: Contacts between conference attendees at the International Conference on Computational Social Science 2017 at GESIS in Cologne, Germany on July 11, 2017.

  • WS-16 [26]: Contacts between participants at the Computational Social Science Winter Symposium 2016 at GESIS in Cologne, Germany on December 1, 2016.

For each data set, interactions between individuals occur in a physical location, and Radio Frequency Identification (RFID) sensors detect a contact when one person is within 1.5 meters of another [2426]. Contacts are recorded at 20-second intervals. Such high-resolution data have been frequently used to discover temporal patterns in human behavior [13, 14, 2729] or to explore how infectious diseases spread through human contacts [3032]. The validity of data recorded by RFID sensors is also examined [33] and network reconstruction methods based on the observed data are studied [34].

We take advantage of the time-resolved data to explore the temporal dynamics of densification and sparsification in the data sets, by converting them to temporal networks with unweighted and undirected edges. We segment a data set into a snapshot sequence (i.e., a series of networks that are ordered in time [35]), which we construct as sliding time windows. A time window has a duration of 10 minutes and consecutive windows have a 5-minute overlap between them. Then, we connect two nodes if they have at least one contact within the time window, and we extend this to all other time windows to obtain a sequence of snapshots. A node is considered to be active if we detect that it is involved in one or more contact events for a given network snapshot. The numbers of active nodes and edges in a snapshot are denoted by N and M, respectively (Table 1). The observed N and M are shown in Fig. 1 (see Fig. S1 in Additional file 1 for different days).

Figure 1
figure 1

Evolution of number of edges M and active nodes N in face-to-face networks. The following days are shown for each data set: (a) Hospital on December 7, 2010, (b) Workplace on June 27, 2015, (c) IC2S2-17 on July 11, 2017 and (d) WS-16 on December 1, 2016. Timeline below panels c and d identify time windows for scheduled events. Gray shading highlights unrestricted sessions i.e., registration, break, lunch, poster session and closing remarks

Table 1 Summary of properties for the datasets

Estimation

Dynamic hidden-variable model

To explore the densification and sparsification dynamics in temporal networks, we employ a hidden-variable (or a fitness) model with a temporal dimension [19, 20, 36, 37]. The probability that two nodes i and j are connected in time interval \([t,t+\Delta t]\) (henceforth, we refer to as time interval t) is given by

$$\begin{aligned} p_{ij,t} = 1-e^{-\kappa _{t} a_{i} a_{j}},\quad{i, j = 1,\ldots ,N_{{ \mathrm{p}},t}, t = 1,\ldots ,T}, \end{aligned}$$
(1)

where \(a_{i}\) is node i’s intrinsic activity level, assumed to be uniformly distributed on \([0,1]\). We define the range of \(N_{\mathrm{p}}\) as \(2\le N_{\mathrm{p}}\le 230\) for all except Workplace data, for which we allow \(N_{\mathrm{p}}\) to span the interval \(2\le N_{\mathrm{p}}\le 130\) (increasing by 1 for all data sets). For κ, we consider the interval \(0\le \kappa \le 0.75\) (incremented by 0.01) for Workplace and \(0\le \kappa \le 4\) (incremented by 0.025) for all others. A smaller grid size for κ (i.e., 0.01) is used to implement the method in Workplace data set, thus resulting in more combinations given the same parameter space employed for Hospital and the conference data sets. To reduce the computational time, we therefore elected to restrict the parameter space for Workplace. Note that the dynamical source of networks is decomposed into two factors: \(N_{{\mathrm{p}},t}\) and \(\kappa _{t}\). In time interval t, the overall activity of nodes is captured by \(\kappa _{t}>0\), which encapsulates changing activity levels due to prespecified schedule, circadian rhythm, etc., while the total number of nodes (i.e., combined sum of active and inactive nodes) is denoted by \(N_{{\mathrm{p}},t}\). It should be noted that the number of active individuals N, at a given time, can be directly observed from contact lists, but the potential number of individuals (i.e., population) in a system is not usually known because contact events naturally exclude non-interacting individuals. Due to the lack of information on population, it is generally not obvious to what extent variations in N and M could be explained by changes in population or activity. Our model takes into account the two possible factors, population and overall activity level, in explaining the observed behaviors of N and M, that cause densification and sparsification of temporal networks.

Note that for a given observed pair of \((N,M)\), the estimates for \((\kappa ,N_{\mathrm{p}})\) would generally differ if we consider a different functional form of \(p_{i,j}\). As an alternative to the connecting probability in Eq. (1), we also show the results for the following specification:

$$\begin{aligned} p_{ij,t} = \kappa _{t} a_{i} a_{j}, \quad{i, j = 1,\ldots ,N_{{\mathrm{p}},t}, t = 1,\ldots ,T}. \end{aligned}$$
(2)

This specification is employed in previous studies [19, 20]. Here, the interval for \(N_{p}\) remains as \(2\le N_{\mathrm{p}}\le 230\); however, κ spans the range \(0\le \kappa \le 1\) in increments of 0.01. It should be noted that while the possible range of values for κ in Eq. (2) is different from that in Eq. (1), the essential role of κ in the two specifications are the same (i.e., κ captures the overall activity level).

Numerical maximum-likelihood estimation

We estimate the parameters \((\kappa _{t},N_{{\mathrm{p}},t})\) for a given \((N_{t},M_{t})\) in time interval t, using a numerical maximum-likelihood method. Let \(\Theta _{\kappa}\equiv \left\{\kappa ^{(1)},\ldots ,\kappa ^{(L_{ \kappa})} \right\}\) and \(\Theta _{\mathrm{p}}\equiv \left\{N_{\mathrm{p}}^{(1)},\ldots ,N_{\mathrm{p}}^{(L_{ \mathrm{p}})} \right\}\) be the sets of all possible values for κ and \(N_{\mathrm{p}}\), respectively. The Cartesian product of two sets \(\Theta _{\kappa}\) and \(\Theta _{\mathrm{p}}\) is given as

$$\begin{aligned} \Theta = \bigl\{ (\kappa ,N_{\mathrm{p}})|\kappa \in \Theta _{\kappa}, N_{ \mathrm{p}}\in \Theta _{\mathrm{p}} \bigr\} . \end{aligned}$$
(3)

We define \(\boldsymbol{\theta}^{\ell}\in \Theta \) as the -th element of the set Θ for \(\ell =1,\ldots ,|\Theta |\), where \(|\Theta |= L_{\kappa }L_{\mathrm{p}}\) is the cardinality of Θ, i.e., the total number of combinations \((\kappa ,N_{\mathrm{p}})\).

Our maximum-likelihood estimation proceeds as follows:

  1. 1.

    For a given \(\boldsymbol{\theta}^{\ell}\), generate an unweighted and undirected network based on probabilities \(\{p_{ij}\}\) for \(i>j\). By repeating the network generation S times, one can obtain a sequence of combinations \(\left\{ (N^{(s)},M^{(s)} ) \right\}_{ s=1}^{S}\), where \(N^{(s)}\) and \(M^{(s)}\) respectively denote the number of active nodes and the number of edges observed in the s-th simulation. We set \(S=10^{4}\).

  2. 2.

    Count the number of appearances of each unique combination in \(\left\{ (N^{(s)},M^{(s)} ) \right\}_{ s=1}^{S}\) and express as a fraction of the number of runs S to get the joint distribution \(f(N, M|\boldsymbol{\theta }^{\ell })\), i.e., the likelihood function for a given \(\boldsymbol{\theta}^{\ell}\).

  3. 3.

    Repeat steps 1 and 2 to obtain a set of likelihood functions \(\left\{f(N, M|\boldsymbol{\theta }^{\ell }) \right\}_{\ell =1}^{|\Theta |}\).

  4. 4.

    Select \(\ell =\ell ^{*}(\leq |\Theta |)\) such that \(f(N_{t},M_{t}| \boldsymbol{\theta }^{\ell ^{*}})\) yields the highest probability for a given empirical observation \((N_{t},M_{t})\). The maximum-likelihood estimators \(\widehat{\kappa}_{t}\) and \(\widehat{N}_{{\mathrm{p}},t}\) are thus given by

    ( κ ˆ t , N ˆ p , t ) = θ ,
    (4)

    where \(\ell ^{*} =\operatorname*{arg\,max}_{\ell } f(N_{t},M_{t}| \boldsymbol{\theta }^{\ell })\).

  5. 5.

    Repeat steps 1–4 for all time intervals \(t = 1,\ldots , T\).

A schematic of the estimation method is presented in Fig. 2. Note that the pre-estimation part of the above procedure (i.e., generating a sufficiently large number of network snapshots and obtaining likelihood functions from them) does not need to be re-executed when implementing estimation on multiple datasets. One can apply the obtained likelihood functions to any temporal network as long as the generated values of \((N,M)\), represented by \(\left\{ (N^{(s)},M^{(s)} ) \right\}_{ s=1}^{S}\), are still valid for those datasets (i.e., the generated \((N,M)\) space includes the empirical observation of \(\{(N_{t},M_{t})\}_{t}\)). Simply stated, once likelihood functions are calculated, one essentially obtains a function that takes empirical \((N_{t},M_{t})\) as input and returns \(\left(\widehat{\kappa}_{t},\widehat{N}_{{\mathrm{p}},t}\right)\) as output. We provide a Python implementation of such a function in Github [38].

Figure 2
figure 2

Schematic of maximum-likelihood estimation of κ and \(N_{\mathrm{p}}\). The top panel shows contact data that gives a combination \((N_{t},M_{t})\). The sequence \(\{(N_{t},M_{t})\}_{t=1}^{T}\) is plotted in the N-M space, in which a particular combination of \((11,8)\) is highlighted in red (bottom left). The joint distributions of \((N,M)\), or likelihood functions, are generated using the hidden-variable model for different combinations of \((\kappa ,N_{\mathrm{p}})=\boldsymbol{\theta}\), with each indexed by \(\ell ^{\prime}\) and \(\ell ^{\ast}\) (bottom right). A likelihood function gives the probability of observing a network with N nodes and M edges, for a given combination of \((\kappa ,N_{\mathrm{p}})\). The maximum-likelihood estimators, denoted by \(\widehat{\kappa}_{t}\) and \(\widehat{N}_{{\mathrm{p}},t}\), are given by a combination of κ and \(N_{\mathrm{p}}\) associated with the maximum-likelihood function \(f_{\ell ^{\ast}}=f(N_{t},M_{t}|{\boldsymbol{\theta}}^{\ell ^{\ast}})\)

Validation analysis

We perform a validation analysis to assess the accuracy of our numerical maximum-likelihood method in estimating the model parameters. For each combination of the true values (\(N_{\mathrm{p}}\), κ), we generate synthetic networks based on the baseline model (Eq. (1)) and apply the estimation method to obtain \(\widehat{N}_{\mathrm{p}}\) and κ̂. Then we take the average of the respective estimated values over 1000 runs.

Based on the comparison of estimated values with their respective true values, the maximum-likelihood estimators perform well in recovering the true population size and overall activity (Fig. 3). It should be noted, however, that κ̂ is sensitive to small values of the true population size (i.e., \(N_{\mathrm{p}}\le 50\)), in that κ̂ overestimates true κ (Fig. 3, bottom). For larger population sizes (e.g., \(N_{\mathrm{p}}> 50\)), the performance of the estimation method improves considerably, with deviations, if any, being much smaller. The reason for the low accuracy when \(N_{\mathrm{p}}\) is small is that our method relies on N and M to identify the most likely combination of the model parameters; a particular combination (N, M) does not necessarily have a one-to-one correspondence with a particular (κ, \(N_{\mathrm{p}}\))-combination especially when the network is small, thereby making it possible to see large deviations as exhibited in Fig. 3 (bottom). We also show another validation in which both parameters are changing (Fig. S2).

Figure 3
figure 3

Validation of the maximum-likelihood estimation method. (a) Estimation of \(N_{\mathrm{p}}\) ranging from 5 to 200 (incremented by 5) for a given κ annotated at the top. (b) Estimation of κ ranging from 0.05 to 2.5 (incremented by 0.05) for a given \(N_{\mathrm{p}}\) annotated at the top. The estimates \(\widehat{N}_{{\mathrm{p}},t}\) and \(\widehat{\kappa}_{t}\) are obtained based on the connecting probability specified by Eq. (1). (c) The same experiment as in panel b for smaller networks. The estimation may not work well when \(N_{\mathrm{p}}\) is quite small. In all panels, the error bars denote one standard deviation computed over 1000 runs. Solid line represents the true value (i.e., 45-degree line)

Results

Evolution of κ and \(N_{\mathrm{p}}\) in temporal social networks

Estimation results for \(N_{\mathrm{p}}\) and κ are shown in Figs. 4 and 5, respectively. Fluctuating \(\widehat{N}_{{\mathrm{p}},t}\) and \(\widehat{\kappa}_{t}\) in the four data sets indicate that, quite often, both are changing simultaneously. Similar findings are seen in other days for the model based on Eq. (1) (Figs. S3 and S4) and also for the alternative probability based on Eq. (2) (Figs. S5–S8). As an alternative to the uniform distribution of individual activity level, we also implement the method based on a (skewed) beta distribution (Figs. S9 and S10) and find a tendency for both parameters to fluctuate simultaneously. Furthermore, we highlight that the method is flexible and applicable to any temporal network given its reliance on basic network measures. To this end, we examined a well-known data set on interbank trading (e-MID [18]) and two biological networks (Insect-Ant-Trophallaxis [39]) (see Figs. S11 and S12).

Figure 4
figure 4

Estimated size of network population, \(\widehat{N}_{{\mathrm{p}},t}\), number of active persons, \(N_{t}\), and total isolated nodes, \(\widehat{N}_{{\mathrm{p}},t}-N_{t}\) for (a) Hospital, (b) Workplace, (c) IC2S2-17, and (d) WS-16. Timelines at the bottom identify time windows for conference schedule. Gray shading highlights unrestricted sessions i.e., registration, break, lunch, poster session and closing remarks

Figure 5
figure 5

Estimated overall activity level, \(\widehat{\kappa}_{t}\). (a) Hospital, (b) Workplace, (c) IC2S2-17, and (d) WS-16. Timelines at the bottom identify time windows for conference schedule. Gray shading highlights unrestricted sessions i.e., registration, break, lunch, poster session and closing remarks

In social networks, a source of shifts in κ and \(N_{\mathrm{p}}\) would be stemming from situational conditions that may affect human behavior in each location. One example is a prespecified schedule in an academic conference that rules the behavior of participants [20, 4042]. For the IC2S2-17 and WS-16 data, we can compare the shifts in the estimated values with the official programs that are available publicly [43, 44]. In contrast, a strict schedule of activities is not stipulated in the Hospital and Workplace data, thus precluding a similar kind of assessment.

Dynamic behavior of estimated population size \(\widehat{N}_{{\mathrm{p}},t}\)

The fluctuations of \(\widehat{N}_{{\mathrm{p}},t}\) in Hospital and Workplace (Figs. 4a and b) stand in contrast to those of IC2S2-17 and WS-16 (Figs. 4c and d), in which \(\widehat{N}_{{\mathrm{p}},t}\) exhibits more systematic variations. Prior to the first keynote talk of IC2S2-17, \(\widehat{N}_{{\mathrm{p}},t}\) increases steadily as expected during a period when participants are arriving at the venue; however, it declines during poster session (Figs. 4c). The poster session precedes the final keynote talk; hence the decline in population size (Fig. 4c after 15:00) may reflect the exit of participants who, based on the subsequent rise in \(\widehat{N}_{{\mathrm{p}},t}\) shortly after (Fig. 4c, 17:00), reconvene for the keynote speech (Fig. 4c, 17:30). In WS-16, population size is also high during oral and keynote sessions and a noticeable decline is seen during the closing remarks, which is the final event of the day (Fig. 4d, 17:00). In Fig. S3d, WS-16 has a similar schedule to that of IC2S2-17 (Fig. 4c) and similar movements in \(\widehat{N}_{{\mathrm{p}},t}\), which grows during registration but subsides during poster session before increasing again prior to the start of the final keynote speech.

In most of the data sets, total active individuals \(N_{t}\) follows closely the population size, which is the maximum possible value of nodes that can be active at a given time (i.e, \(N_{t} \le N_{{\mathrm{p}},t}\)). From the estimated population size, we can compute the number of resting nodes as \(\widehat{N}_{{\mathrm{p}},t}-N_{t}\) (Fig. 4, broken line). Resting nodes reflect a realistic but generally unobservable feature of dynamic networks, that of isolated individuals who are not in direct contact with any other individual in the system [19]. In conference data, total isolated nodes exhibit a systematic correspondence with activities; few individuals are isolated during registration, break, lunch, and poster session, while elevated levels are seen for keynote talks and oral sessions (Figs. 4c and d, broken line). In Hospital data, total isolated nodes is fairly small (close to zero in many instances); however, this is not unnatural in such high-contact environments where hospital staff are frequently engaging each other and/or attending to patients (Fig. 4a, broken line). In contrast, the number of isolated nodes in Workplace data is generally high, up to three times \(N_{t}\) (Fig. 4b). Further investigation into the impact of contact duration in conference data indicate that, for longer time windows (e.g., 15 minutes or 30 minutes) the number of isolated nodes is smaller relative to the baseline results (see Fig. S13 vs. Fig. 4c–d). In contrast to the 10-minute time windows, a longer snapshot length allows us to observe individuals over an extended period; thus increasing the chance of an initially isolated node getting at least one contact.

Dynamic behavior of estimated overall activity \(\widehat{\kappa}_{t}\)

The estimated activity parameter, \(\widehat{\kappa}_{t}\), is high during unrestricted sessions at both conferences, signaling intense interactions between participants (Figs. 5c and d, shading). However, during keynote talks and oral sessions, \(\widehat{\kappa}_{t}\) fluctuates around much smaller values. This suggests that attendees have a greater chance of making contact with each other during registration, coffee break, lunch and poster session than during the oral sessions. Although \(\widehat{\kappa}_{t}\) declines and remains very low for the duration of keynote talks and oral sessions, our method still detects slight variations, suggesting that \(\widehat{N}_{{\mathrm{p}},t}\) is not the only dynamical parameter at play. Figure S6 shows estimated overall activity for the same days based on an alternative probability, Eq. (2). For larger temporal snapshots, we find a similar tendency for estimated \(\widehat{\kappa}_{t}\) to fluctuate around smaller values during keynote talks and oral sessions relative to larger values during open sessions (Fig. S14).

In contrast, \(\widehat{\kappa}_{t}\) changes more erratically in Hospital and Workplace data. A discernible pattern that corresponds with coordination in movement or activity, as seen in conference data, is not exhibited (Figs. 5a and b). Nevertheless, for Hospital data, \(\widehat{\kappa}_{t}\) is highest at the end of the day (Fig. 5a) when there is also a diminution in population size (Fig. 4a), while for Workplace data, \(\widehat{\kappa}_{t}\) is highest at the beginning of the day (Fig. 5b) when \(\widehat{N}_{{\mathrm{p}},t}\) is increasing (Fig. 4b). At these times, the behaviors of N and M reflect the dual impact of a sharp rise in \(\widehat{\kappa}_{t}\) as individuals leave the Hospital network (thereby reducing \(\widehat{N}_{{\mathrm{p}},t}\)) or individuals in Workplace join the system (thereby increasing \(\widehat{N}_{{\mathrm{p}},t}\)).

Time-varying contribution of \(N_{\mathrm{p}}\) and κ to the emergence of densification scaling

We now examine the dynamical relationship between the number of active nodes N and the number of edges M in empirical data to identify the source of densification scaling in social networks. Figures 6 and 7 demonstrate the relationship between N and M based on a series of temporal snapshots for each data set, and the respective color scales denote changing levels of population size \(\widehat{N}_{{\mathrm{p}},t}\) and overall activity \(\widehat{\kappa}_{t}\). All data sets exhibit a superlinear scaling, or “densification power law” [17, 45], i.e., M grows in N more than proportionally. This behavior is also evident in other days which we analyzed for each data set (Figs. S15–S16). However, the scaling pattern emerges as a mixture of two distinct behaviors [19, 20]; the straight-line scaling pattern indicates a constant exponent \(\gamma > 1\) of \(M\propto N^{\gamma}\), and it emerges for small to intermediate values of N. However, for larger values of N, total edges M grows along an upward bending trajectory, implying an accelerating growth of M in N. The two patterns are easily distinguished in the conference networks but to a lesser extent in Hospital and Workplace data.

Figure 6
figure 6

Densification scaling and changes in estimated population size in face-to-face networks. N-M scaling plots are shown for (a) Hospital on December 7, 2010 (b) Workplace on June 27, 2015 (c) IC2S2-17 on July 11, 2017 and (d) WS-16 on December 1, 2016. Each dot represents a snapshot of the network and colors denote estimated population size \(\widehat{N}_{{\mathrm{p}},t}\) based on the respective color bar. Gray dashed and dotted lines show theoretical lower (\(M = N/2\)) and upper (\(M = N(N-1)/2\)) bounds. Estimates are based on Eq. (1)

Figure 7
figure 7

Densification scaling and changes in overall activity \(\widehat{\kappa}_{t}\) in face-to-face networks. N-M scaling plots are shown for (a) Hospital on December 7, 2010 (b) Workplace on June 27, 2015 (c) IC2S2-17 on July 11, 2017 and (d) WS-16 on December 1, 2016. Each dot represents a snapshot of the network and colors denote estimated population size \(\widehat{N}_{{\mathrm{p}},t}\) based on the respective color bar. Gray dashed and dotted lines show theoretical lower (\(M = N/2\)) and upper (\(M = N(N-1)/2\)) bounds. Estimates are based on Eq. (1)

In all data sets, a linear pattern tends to emerge within a specific range of values for population size \(\widehat{N}_{{\mathrm{p}},t}\) and activity level \(\widehat{\kappa}_{t}\). Population size gradually expands from small to moderate values and, along with this, \(N_{t}\) is also increasing (Fig. 6). At the same time, activity level is high in small networks with the number of edges M at its upper bound \(N(N-1)/2\) in some instances, implying that a considerable proportion of the individuals present are engaged (Figs. 7a and S16a, c–d). However, as the population grows, activity level declines rapidly and M continues to grow at a constant rate (e.g. Fig. 7a blue-green-yellow transition). During this phase, therefore, the dynamics between M and N are dominated by the gradual expansion of \(\widehat{N}_{{\mathrm{p}},t}\) which allows an increasing number of individuals to become active.

Given that population size of face-to-face networks is finite, \(\widehat{N}_{{\mathrm{p}},t}\) will eventually become constant but M may continue to grow as the number of active nodes N gradually approaches \(\widehat{N}_{{\mathrm{p}},t}\), yielding an upward bending slope towards M’s upper bound \(N(N-1)/2\) (dotted line in Figs. 6 and 7). The plots for IC2S2-17 and WS-16 in Figs. 6 and 7 suggest that this accelerating growth in M occurs as κ̂ increases while \(\widehat{N}_{{\mathrm{p}},t}\) remains high and relatively constant. As the number of active individuals N gets closer to \(\widehat{N}_{{\mathrm{p}},t}\), few isolated nodes (if any) remain, thus resulting in denser networks in which M is almost at the maximum number of edges that can exist between active nodes. To enable these previously isolated individuals to make at least one connection, overall activity level increases, and this drives the continued growth in aggregate edges. We also show in Additional file 1 the corresponding figures based on the alternative probability of connection in Eq. (2), and the results are consistent with that of the baseline model (Figs. S17–S20). Additionally, we investigate the effect of contact duration (Figs. S21 and S22) and the implication of assuming a broader distribution of individual activity level (Fig. S23 and S24); the results substantiate the role played by \(\widehat{N}_{{\mathrm{p}},t}\) and \(\widehat{\kappa}_{t}\) in how densification scaling emerges.

Discussion

In this study, we proposed a method to identify the driving force of the dynamical relationship between total active nodes N and total edges M in temporal networks. Changes in population size \(N_{\mathrm{p}}\) and overall activity κ have both been identified as the mechanisms behind this dynamical relationship, each contributing to the emergence of different densification scaling patterns [16, 17, 19, 20, 45]. Our main contribution is a numerical maximum-likelihood method that is able to estimate simultaneously, population size \(N_{\mathrm{p}}\) and activity rhythm κ at given times, extending previous works in which one parameter is estimated by assuming the other is constant [19, 20]. We found that changes in the mechanisms of densification and sparsification reflect explicit periodic transitions in networks that have rigid time constraints. Furthermore, our findings remain consistent with previous studies which explain the emergence of a constant scaling exponent as the result of an increasing population size, while the accelerating growth pattern is being impelled by intensification of overall activity [19, 20].

Although we have focused on social temporal networks in face-to-face contexts, the method is adaptable to practically any dynamical system that can be modeled as a time-varying network of nodes and edges [14, 28, 46]. This is one advantage of our method because of the accessibility of N and M in most networks without having privacy issues. Of course, there are some limitations which need to be addressed in future research. First, we employed a dynamic hidden variable model in generating networks, in which each node is randomly linked to another based on their individual activity [36, 37]. This means that although the model can reproduce the global quantities of N and M, more realistic structural features that are known to exist in social networks (e.g. community structure, triadic closure) are absent in generated networks. However, our focus in this work is to understand the variation in these global quantities of networks which does not require knowledge of structural properties. Our method also facilitates the use of network generating models that incorporate such properties observed in empirical networks.

Second, we assume that the distribution of node fitness (i.e., intrinsic activity of a node) in the network generating model is uniform. Although an empirical fitness distribution is preferred, the challenge exists in obtaining the individual activity level of nodes that are part of the population but are dormant (i.e., having no edges). Such nodes are generally not observable, because they are not explicitly stated as nodes that have interacted with others in the contact data set.

Third, due to the numerical nature of the likelihood function, it is generally difficult to obtain confidence intervals and p-values. To quantify the statistical significance, we would need to rely on other approaches such as nonlinear least squares [19] and Bayesian estimation [20]. However, in those previous studies, it is assumed that either κ or \(N_{\mathrm{p}}\) is fixed at each point in time to consider environments in which the contact dynamics are driven by changes in activity or population.

The relevance of this work lies in the simplicity of the method for understanding the dynamical relationship between fundamental global quantities of temporal networks, and the adaptability of our method to include more realistic features of empirical networks. The dynamics of network growth and shrinkage is central to how systems work, and it would also be one crucial factor in how information and infectious diseases spread in networks. Given the pervasiveness of complex systems and our reliance on them in our daily lives, greater understanding of the dynamics of networks would improve how we interact with, and even control such systems.

Availability of data and materials

The data and Python code are available in GitHub [38].

References

  1. Jo HH, Karsai M, Kertesz J, Kaski K (2012) Circadian pattern and burstiness in mobile phone communication. New J Phys 14:013055

    Article  Google Scholar 

  2. Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D et al. (2007) Structure and tie strengths in mobile communication networks. Proc Natl Acad Sci USA 104:7332–7336

    Article  Google Scholar 

  3. Kovanen L, Saramaki J, Kaski K (2011) Reciprocity of mobile phone calls. Dyn Sociol Econ Sys 2:138–151. Also available on arXiv:1002.0763

    Google Scholar 

  4. Schläpfer M, Bettencourt LMA, Grauwin S, Raschke M, Claxton R, Smoreda Z, West GB, Carlo R (2014) The scaling of human interactions with city size. J R Soc Interface 11:20130789

    Article  Google Scholar 

  5. Ghosh A, Monsivais D, Bhattacharya K, Dunbar RI, Kaski K (2019) Quantifying gender preferences in human social interactions using a large cellphone dataset. EPJ Data Sci 8:9

    Article  Google Scholar 

  6. Opsahl T, Colizza V, Panzarasa P, Ramasco JJ (2008) Prominence and control: the weighted rich-club effect. Phys Rev Lett 101:168702

    Article  Google Scholar 

  7. Panzarasa P, Opsahl T, Carley KM (2009) Patterns and dynamics of users’ behavior and interaction: network analysis of an online community. J Am Soc Inf Sci Technol 60:911–932

    Article  Google Scholar 

  8. Klimt B, Yang Y (2004) The Enron corpus: a new dataset for email classification research. In: Machine learning: ECML 2004. Springer, Berlin

    Google Scholar 

  9. Isella L, Stehlé J, Barrat A, Cattuto C, Pinton JP, den Broeck WV (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271:166–180

    Article  MathSciNet  MATH  Google Scholar 

  10. Starnini M, Baronchelli A, Pastor-Satorras R (2013) Modeling human dynamics of face-to-face interaction networks. Phys Rev Lett 110:168701

    Article  Google Scholar 

  11. Barrat A, Cattuto C (2013) Temporal networks of face-to-face human interactions. In: Holme P, Saramäki J (eds) Temporal networks. Springer, Berlin

    Google Scholar 

  12. Génois M, Vestergaard CL, Fournet J, Panisson A, Bonmarin I, Barrat A (2015) Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers. Netw Sci 3:326–347

    Article  Google Scholar 

  13. Kobayashi T, Takaguchi T, Barrat A (2019) The structured backbone of temporal social ties. Nat Commun 10:220

    Article  Google Scholar 

  14. Holme P, Saramäki J (2012) Temporal networks. Phys Rep 519:97–125

    Article  Google Scholar 

  15. Holme P (2015) Modern temporal network theory: a colloquium. Eur Phys J B 88:234

    Article  Google Scholar 

  16. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, Chicago

    Google Scholar 

  17. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1:2

    Article  Google Scholar 

  18. Kobayashi T, Takaguchi T (2018) Social dynamics of financial networks. EPJ Data Sci 7:15

    Article  Google Scholar 

  19. Kobayashi T, Génois M (2020) Two types of densification scaling in the evolution of temporal networks. Phys Rev E 101:052302

    Article  MathSciNet  Google Scholar 

  20. Kobayashi T, Génois M (2021) The switching mechanisms of social network densification. Sci Rep 11:1–11

    Article  Google Scholar 

  21. Hamilton J (1994) Time series analysis. Princeton University Press, Princeton

    Book  MATH  Google Scholar 

  22. Hamilton JD (2010) Regime switching models. In: Durlauf SN, Blume LE (eds) Macroeconometrics and time series analysis. Palgrave Macmillan, London

    Google Scholar 

  23. SocioPatterns project. http://www.sociopatterns.org/. Accessed 29 May 2022

  24. Vanhems P, Barrat A, Cattuto C, Pinton JF, Khanafer N et al. (2013) Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PLoS ONE 8:73970

    Article  Google Scholar 

  25. Génois M, Barrat A (2018) Can co-location be used as a proxy for face-to-face contacts?. EPJ Data Sci 7:11

    Article  Google Scholar 

  26. Génois M, Zens M, Lechner C, Rannstedt B, Strohmaier M (2019) Building connections: how scientists meet each other during a conference. arXiv:1901.01182

  27. Cattuto C, Van den Broeck W, Barrat A, Colizza V, Pinton JF et al. (2010) Dynamics of person-to-person interactions from distributed RFID sensor networks. PLoS ONE 5:11596

    Article  Google Scholar 

  28. Masuda N, Lambiotte R (2016) A guide to temporal networks. World Scientific, London

    Book  MATH  Google Scholar 

  29. Karsai M, Jo HH, Kaski K et al. (2018) Bursty human dynamics. Springer, Berlin

    Book  Google Scholar 

  30. Salathé M, Kazandjieva M, Lee JW, Levis P, Feldman MW et al. (2010) A high-resolution human contact network for infectious disease transmission. Proc Natl Acad Sci USA 107:22020–22025

    Article  Google Scholar 

  31. Stehlé J, Voirin N, Barrat A, Cattuto C, Isella L et al. (2011) High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6:23176

    Article  Google Scholar 

  32. Masuda N, Holme P (2017) Temporal network epidemiology. Springer, Singapore

    Book  MATH  Google Scholar 

  33. Elmer T, Chaitanya K, Purwar P, Stadtfeld C (2019) The validity of RFID badges measuring face-to-face interactions. Behav Res Methods 51:2120–2138

    Article  Google Scholar 

  34. Dai S, Bouchet H, Nardy A, Fleury E, Chevrot JP et al. (2020) Temporal social network reconstruction using wireless proximity sensors: model selection and consequences. EPJ Data Sci 9:19

    Article  Google Scholar 

  35. Cazabet R, Rosetti G (2019) Challenges in community discovery on temporal networks. In: Holme P, Saramäki J (eds) Temporal network theory. Springer, New York

    Google Scholar 

  36. Caldarelli G, Capocci A, De Los Rios P, Muñoz MA (2002) Scale-free networks from varying vertex intrinsic fitness. Phys Rev Lett 89:258702

    Article  Google Scholar 

  37. Boguñá M, Pastor-Satorras R (2003) Class of correlated random networks with hidden variables. Phys Rev E 68:036112

    Article  Google Scholar 

  38. Python implementation of the estimation method. https://github.com/shaunette/densificationscalingMLE

  39. Quevillon LE, Hanks EM, Bansal S, Hughes DP (2015) Social, spatial and temporal organization in a complex insect society. Sci Rep 5:1–11

    Article  Google Scholar 

  40. Barrat A, Cattuto C, Szomszor M, Van den Broeck W, Alani H (2010) Social dynamics in conferences: analyses of data from the live social semantics application. In: International semantic web conference. Springer, Berlin

    Google Scholar 

  41. Barrat A, Cattuto C, Colizza V, Gesualdo F, Isella L et al. (2013) Empirical temporal networks of face-to-face human interactions. Eur Phys J Spec Top 222:1295–1309

    Article  Google Scholar 

  42. Kibanov M, Atzmueller M, Illig J, Scholz C, Barrat A et al. (2015) Is web content a good proxy for real-life interaction? A case study considering online and offline interactions of computer scientists. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015

    Google Scholar 

  43. IC2S2 2017 program https://quanttext.com/wp-content/uploads/2018/09/IC2S2-2017-program.pdf. Accessed 29 May 2022

  44. Computational Social Science Winter Symposium 2016 program. https://www.gesis.org/en/css-wintersymposium/program/schedule. Accessed 29 May 2022

  45. Bettencourt LM, Kaiser DI, Kaur J (2009) Scientific discovery and topological transitions in collaboration networks. J Informetr 3:210–221

    Article  Google Scholar 

  46. Holme P, Saramäki J (2013) Temporal networks. Springer, Berlin

    Book  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

T.K. acknowledges financial support from JSPS KAKENHI 19H01506, 20H05633 and 22H00827.

Author information

Authors and Affiliations

Authors

Contributions

TK conceived the research and defined the model. STF performed the analysis. STF and TK discussed the results and wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Teruyoshi Kobayashi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary information (PDF 4.6 MB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ferguson, S.T., Kobayashi, T. Identifying the temporal dynamics of densification and sparsification in human contact networks. EPJ Data Sci. 11, 52 (2022). https://doi.org/10.1140/epjds/s13688-022-00365-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjds/s13688-022-00365-3

Keywords

  • Temporal networks
  • Densification scaling
  • Human contacts