On the duration of face-to-face contacts

The analysis of social networks, in particular those describing face-to-face interactions between individuals, is complex due to the intertwining of the topological and temporal aspects. We revisit them both, using public data recorded by the sociopatterns wearable sensors in some very different sociological environments, putting particular emphasis on the contact duration timelines. As well known, the distribution of the contact duration for all the interactions within a group is broad, with tails that resemble each other, but not precisely, in different contexts. By separating each interacting pair, we find that the fluctuations of the contact duration around the mean-interaction time follow however a very similar pattern. This common robust behavior is observed on 7 different datasets. It suggests that, although the set of persons we interact with and the mean-time spent together, depend strongly on the environment, our tendency to allocate more or less time than usual with a given individual is invariant, i.e. governed by some rules that lie outside the social context. Additional data reveal the same fluctuations in a baboon population. This new metric, which we call the relation *contrast*, can be used to build and test agent-based models, or as an input for describing long duration contacts in epidemiological studies.


Introduction
Since the advent of the Internet, the quantity of digital data describing our behavior has inflated, offering to scientists an unprecedented opportunity to study human interactions in a more quantitative way.This opened the field of sociology to data-analysis and from the hard-science community, came the tacit idea that several aspects of the complex human behavior can be modeled [1][2][3][4][5][6].With the rapid development of mobile technologies (GPS, Bluetooth, cellphones) a lot of effort was first put in trying to capture the patterns of human mobility (for a review, see [7]).A more local picture of our everyday social interactions can be obtained using dedicated proximity sensors.Following a pioneering experiment that equipped conference participants with pocket switched devices [8,9], the sociopatterns collaboration (www.sociopatterns.org)developed some wearable sensors that allow to register the complex patterns of face-to-face interactions [10,11].The radio-frequency signal is only recorded if two individual are in front of each other for a duration of a least 20 s (which is the timing resolution).We note that, from a sociological point of view, a distance below 1.5 m covers the traditional private (<50 cm), personal (<1.2 m) and social (<3.5 m) zones.The goal is not only to analyze social interactions but also to understand how information (or a disease) spreads over a real dynamical network [12][13][14][15].Those sensors were worn by volunteers in several work-related environments: scientific conferences [10,12,13], a hospital ward [16], an office [15] and at school [17,18].As part of a UNICEF program, they were also used to characterize social exchanges in small villages in Kenya and Malawi [19,20] and for ethological studies on baboons [21].
It has been known for a long time that the overall distribution of the duration of contacts in face to face interactions is "broad" [8] and presents some "similarities" when observed in different environments (see [22] for a short review).
However, those comparisons were performed on data taken in some similar sociological environments, which are typically occidental, educated and often with a scientific background (in conferences or high-school).Here we wish to extend the study of face-to-face interactions by comparing them to some very different datasets that were originally designed for other aims.The fist one are the data taken in the rural Malawi village.The second one concerns interactions among baboons in a primatology center.
Moreover, there is more information in the data than what was previously presented [10,11].Indeed, one has access to the full timeline of interactions for each pair of individuals separately (what we call in the following a "relation").This allows to study the mean-interaction time per relation and, most importantly, deviations of the contact duration from it, which reveals the underlying relation dynamics.We will show that they are surprisingly similar in all the settings.
After describing our data selection and methodological differences with some previous studies in Sect.2, we will focus on the details of the temporal interactions in Sect.3.2 after showing rapidly that social interactions among the participants are obviously very different in each environment.We will introduce the concept of contrast of the contact duration (deviation from the mean) and show that the distributions are extremely similar on each dataset and for each relation individually.In the Discussion part, we comment on the utility of using the robust contrast distribution in improving agent-based models, and conclude summarizing the results and highlighting some possible future extensions.Some extra information, referred to in the text, is given in the Suplementary Information (SI) document in Additional file 1.

Datasets
We have chosen four datasets from the sociopatterns web site, sociologically most dissimilar.
1. hosp: these are early data collected over 3 days 1 on 75 participants in the geriatric unit of a hospital in Lyon (France) [16].Most interactions (75%) involve nurses and patients.2. conf : these are also some early classical data from the ACM Hypertext 2009 (www.ht2009.org)conference that involved about a hundred of participants for 3 days [13] in Torino (Italy).The audience is international with a scientific background.There exist also some data taken at another conference in Nice in 2009 (SFHH, [23]) with more participants, but we prefer to use the former which has a 1 here and in the following, we will only consider complete (24 h) day periods.
number of individuals comparable to the other datasets.However we have checked that we obtain similar results with the SFHH data.3. malawi: these proximity data were taken in a small village of the district of Dowa in Malawi (Africa) where 86 participants agreed to participate for 13 (complete) days.
Interestingly those data contain both extra and intra-household interactions, although we will not distinguish them here.This community consists essentially of farmers.4. baboons: those data were taken at a CNRS Primate Center near Marseille (France) where 13 baboons were equipped with the sensors for a duration of 26 days.The goal was to study their interactions, and study how conclusions reached from data-analysis match those provided by human observation.With that choice, we span very different sociological environments.We have also analyzed a few other datasets collected at the SFHH conference, an office and a high-school.They give similar results (results are shown in the SI) but we consider them as sociologically closer to the conf one.We have chosen to focus on the sociopatterns data since they provide a consistent set taken with the very same devices, minimizing possible sources of systematic errors.

Differences with previous studies
Previous studies considered the overall temporal properties of interactions, i.e. without differentiating the pair of people interacting.In this work we will put accent on the temporal properties of each pair separately.
Probability distribution functions (p.d.f ) are often estimated by histograms, i.e. by counting the number of samples that fall within some bin.But for heavy-tailed distributions the size of the bins is delicate to choose.With a constant size binning, several bins end up empty for large values.Using a logarithmically increasing binning is neither a solution since it supposes that the distribution is constant on the wide range of last bins.Following [24], we will use instead the probability to exceed function (p.t.e, also known as the "complementary cumulative distribution function" or Zipf plot) which is computed simply by sorting the samples and plotting them with respect to their relative frequency.In this way, one does not need to define a binning and the distribution is easier to apprehend.

Interactions between individuals
Since it is not our primary goal to study the social structures in those very different communities, we just highlight visually some differences on Fig. 1 which shows 24 hr timeaggregated graphs of the relations between individuals.
The graphs for the hosp and especially the conf datasets show a strongly connected core.The malawi one is much sparser, while the baboons one is almost complete showing that each animal interact with all the others.
Table 1 gives a more quantitative view of some of the graph's properties.The number of different people met per day (the degree of the graphs) is about 20 in both the hospital and the conference environments.As is apparent in Fig. 1(c), it is much smaller in the rural community (3).But the interaction times are longer ( 25 min) which reflect different sector of activities (agrarian and including inter-housing relations for the malawi data).
The strength of the relation represents the total time per individual spent interacting with others per day.It is essentially the product of the mean number of people met per Figure 1 Aggregated graphs of interactions over one day for our 4 datasets.Vertices (red points) represent agents and there is a link (edge) if there was at least one face-to-face interaction for more than 20 s.The first day from the datasets is used, but very similar results are obtained with the others Table 1 Properties of time aggregated graphs on each dataset per day.Uncertainties are the standard deviations between the days.T is the number of (complete) days in the dataset.N the number of interacting agents.k is the mean degree, i.e. the average number of agents each individual interacts with during one day.w is the mean weight where the weights specify the total duration of a single relation [25] The comparison to the baboons dataset should be handled with care since there is a much smaller number of agents (13).Since each baboon interacts essentially with each other (Fig. 1(d)), the mean degree is bounded to k N .On the other hand, their small number possibly increases their interaction duration ( w ) so that the strength of their relation is finally similar to that of the human groups.
The goal of this short section is not to dwell into the topological details of these time evolving graphs, but to illustrate that, as expected, these heterogeneous sociological groups show some very distinct interaction patterns between individuals.

Face to face temporal relations
We are interested in the duration of the contacts in those different networks.Figure 2 shows a classical distribution, that of the duration of contacts.We emphasize that such a representation mixes all the interactions of all the participants in the same plot.As well known, these distributions are "heavy-tailed"; most interactions are of short duration (at the minute level) but some may drift up to an hour.Interactions for people in malawi tend to last longer than for all the others.The baboons' duration of interaction is similar to the human ones (as noticed in [21]), although there are some sizable differences at short times, somewhat squeezed by the logarithmic scale.Overall, although there is a common trend, some differences appear too.
The new aspect of this work concerns the detail of each relation separately.For a given data-taking period, each relation consists in a set of intervals measuring the beginning and end times of the interaction at the resolution of the instruments (20 s).There is a varying number of interactions (intervals) per relation, that we call N int (r).In the following we will consider the duration of the interactions that we note {t i (r)} i=1,...,N int (r) .They are thus variable-size timelines expressed in units of the resolution step.
The number of registered interactions for a given pair depends on the total duration of the experiments (Table 1) but we may compare them just for one day.The distribution of this variable is shown in Fig. 3(a).It is clearly different for each group.People at the conference tend to interact (with the same person) less often.In 65% of the cases it is only once per day, against 25% for the hosp and malawi datasets, and 3% for baboons.The mean interaction time per relation is shown in Fig. 3(b).Here again distributions are heavy-tailed and different.There is a marked difference between animals and humans, the former interacting for shorter times.
We are now interested in studying the deviations of the contact duration from the mean value for a given relation.Indeed, in physics the dynamics of a process is often revealed by such a quantity.For instance in cosmology, one uses the "density contrast" that represents the galactic density divided by its mean value.It is the fundamental quantity which traces the dynamics of the underlying field (see e.g.[26]).Inspired by this example, we propose to study what we call the "'duration contrast", or simply "contrast" which is the simplest dimensionless quantity we can form to study deviations from the mean-value where r recalls that the quantity varies for each relation.The contrast represents our tendency to spend more or less time than usual with a given individual.Note that "usual" is meant as the mean-interaction time between the two peculiar agents (Fig. 3) and varies for each relation.For a small number of samples, the arithmetic mean (Eq.( 1)) is however a poor estimate of the true mean-time and also strongly correlated to the individual samples.Taking the ratio leads to a very noisy estimate of the true contrast variable.In the following we will then apply a cut to keep timelines with a sufficient number of samples.Since the distributions are very broad we require at least N int (r) > 50 contacts in a relation.We will study later the effect of this cut on the results.On the complete datasets, we are left with respectively 57, 26, 91 and 70 timelines for the hosp, conf, malawi and baboons datasets.We show the p.t.e distributions of the contact duration contrast for the 4 groups in Fig. 4. The tails look now very similar up to 10 times the mean-time.The same distribution is observed on data from another conference, an office and a high-school (SI Appendix, S2).Thus, a (very) similar distribution is observed on 7 independent datasets.
To be more quantitative and assess the level of compatibility between the distributions, we use a Monte-Carlo method.For each dataset, we numerically invert the empirical distribution functions (which are one minus the p.t.e's shown on Fig. 4) to construct the inverse cumulative function F -1 .We then draw N numbers u from a [0, 1] uniform distribution, transform them with F -1 (u) and reconstruct the p.t.e.The procedure is repeated 100 times and all distributions are plotted on top of each other on Fig. 5.
One sees that the distributions are indeed all compatible in the 0.6 δ 10 range, where the upper bound comes from the limited sample size of the hosp and conf datasets, and the lower one from slight (but statistically significant) differences for low values.This will be our range of interest in the following.
Since the data-taking periods are very heterogeneous (ranging from 3 days for the conf and hosp datasets, to 12 and 26 for the malawi and baboons ones respectively) we have split the data day by day and verified that no particular one(s) particularly affects the results (SI Appendix, S3-1).We have also removed randomly a fraction of the agents (up to 50%), i.e. we removed all relations involving those agents, which did not affect the contrast distributions in a sizable way (SI Appendix, S3-2.Both tests confirm the robustness of the result.).The interaction mean-time corresponds to a value of 1.We then see for instance that the probability for an interaction to last longer than its mean-time is around 30%, but, rarely, it can exceed 10 times the mean-time Another option for studying deviations from the mean is to use the z-score where σ represents the standard-deviation of the duration values.The results obtained with this variable are very similar to the ones with the contrast (SI Appendix, S4) and we did not notice any difference on the tests that are presented later.Since the contrast variable is somewhat simpler (the z-score involving second order statistics) we only focus in the following on it.We consider the impact of applying the N int (r) > 50 cut.First, we note that similar results are obtained with a lower cut value as N int > 30 (SI Appendix, S5).We then show that we can still reproduce the contrast distribution without any cut, using only the distributions with the cut (Fig. 4).To this purpose we perform Monte-Carlo simulations.For a given dataset, for each relation (without any cut), we draw N int (r) random numbers following Fig. 4 distribution to obtain δ i=1,...,N int contrast values.Those samples are obtained from the distribution with the N int (r) > 50 cut, so with precise mean values that we call μ.We may mimic the statistical fluctuations due to any N int (r) value, by using the ratio since μ actually cancels out.We compare the measured contrast distribution to the one observed on data, this time without any N int (r) cut, in Fig. 6 for the conf dataset.We reproduce correctly the whole contrast distribution using only the Fig. 4 one obtained with 1% of the data (N int > 50).Similar results are obtained on the other datasets (SI Appendix, S6.1).This shows that the contrast distribution obtained from the large sample statistics is sufficient to reproduce any number of interactions, including small-sample ones.In other words, the N int (r) > 50 cut only cleans the data without affecting the underlying "true" contrast distribution.
To check that the contrast distribution is not artificially produced by the procedure of dividing the timelines by their mean value, we use the hosp dataset to retrieve the set of interacting agents and their corresponding characteristics N int (r) and t(r).We then draw N int (r) random numbers following a Poisson distribution of parameter t(r) and recompute the contrast.The result is shown in Fig. 7 which is clearly different from the results observed on the data.
The shape of the observed contrast distribution (Fig. 4) is nontrivial.It is neither of exponential nor of power-law form.A stretched-exponential form is neither satisfactory.Empirically, we could obtain a reasonable fit in the 0.6 δ 10 region, by combining both a power-law and an exponential function p(> δ) = 0.3e -0.2δ /δ 1.1 . ( The denominator is here to enhance short contrasts, while the exponential term describes the long ones.This could be an indication of the existence of two regimes, one for short times when communications are more informative and a longer one when real conversations form [27].
At this point, we have shown that the combined contrast duration (i.e. for all relations) follows a very similar distribution.We now consider each relation separately and show in Fig. 8 a superposition of the contrast duration distributions with the N int (r) > 50 cut (similar results are observed without it but are, as expected, more noisy (see SI Appendix, S6.

2).
They all follow rather closely the common contrast distribution.In other words, while the choice of individuals we meet (Fig. 1), the interaction rate (Fig. 3(a)) and mean-time spent together (Fig. 3(b)) varies strongly with the environment, the propensity to spend more (or less) time than usual with a given individual, is remarkably similar.This points to the idea that once a face-to-face contact is triggered it follows its own dynamics, out of the sociological context.
For the sake of completeness, we note that we found no sizable correlations between the contact duration within the timelines (see SI Appendix, S7).This indicates one can draw independent samples using Eq.(5).
We also considered the inter-contact (or "gap") time in the relations to see whether its contrast reveals features similar to the duration ones.This is not the case as shown in

Comparison with a model
The contrast distribution can be used as a new metric when studying face-to-face temporal graphs in order to test and improve existing agent-based models designed to reproduce the full evolution of a set of individuals.For instance, the "force directed motion" (FDM) model is successful in describing several key features of observed face-to-face interactions [6].Based on the idea of attractiveness between some agents performing a random-walk within a bounded perimeter [4,28], the model further includes the concept of "similarity" between two individuals [29], known as homophily in social sciences.The similarity s ij influences the time two agents spend together and the way the random-walk is biased.The model assumes that the contact duration between two agents is exponentially distributed with a rate s ij /μ 1 , where μ 1 is adjusted on the data to reproduce the overall duration of Figure 10 Comparison of the contrast distributions obtained with the hosp dataset to the result of the "force-directed motion" (FDM) model [6].We used the parameters provided by the authors and their dataset (slightly different from ours, dues to a different selection).The FDM curve is the combined result from 10 simulations contacts.We have run the code provided by the authors with their setup corresponding to the hosp dataset, to test the distribution of the contrast variable.Figure 10 shows that the model distribution falls too steeply.We have tried adapting the parameters and some parts of the code but could not find a configuration giving a better contrast distribution (see SI Appendix, S8) . 2odeling correctly the tails of the contact duration is also essential in epidemiological studies since the spread of a disease happens mostly during long interactions.For a given mean-interaction time, Eq. ( 5) allows to simulate a much more realistic duration of contacts than a Poissonnian one.This can be used in SIR-like statistical inference, or using agent-based models, for the precise modeling of long interactions.

Conclusion
We have compared face-to-face interaction data taken in some very different environments; some were recorded in a European hospital and during a scientific conference, others in a small village in Africa.With the original intention to pinpoint differences with the results concerning humans, we have also included data on baboons' interactions in an enclosure.
Although the topological structures (who interacts with whom) and the mean-time spent together are clearly dependent on the sociological environment, it appears that the deviations from the mean-time for each pair (do we spend more/less time than usual with a given person) follow a very similar distribution, including for baboons.We (and baboons) tend to interact most often for much less time than "usual" with a given individual and sometimes, but rarely, much longer.What is striking is that the distribution for this quantity, which we call the "relation contrast" looks universal.It is the same for people at a scientific conference or farmers in a small Malawi village (and baboons in an enclosure), see Fig. 4 (also SI Appendix, S2 for the 7 datasets).
These results suggests that, once a face-to-face contact is triggered, it follows its own dynamics independently from the social context.This is maybe not a big surprise to a sociologist in particular working in the field of Conversation Analysis [27] where it is postulated that each conversation follows some rules independently from the social context . 3But to our knowledge, this was not noticed by physicists and may help disentangling the topological and temporal aspects of face-to-face interactions.
The possible universality of the relation contrast must be challenged with more data.On the animal side, one should consider groups of animals with strong social interactions, that can be identified (labeled) and followed individually.Hominids, as baboons, are known to have social behaviors close to ours, which probably explains the similarity of the contrast distribution with the human's one.Chimpanzee or bonobo's data should show similar characteristic.Concerning mammals, we could think of tracking individuals in elephant herds or wolf packs but it's difficult to acquire precise data in the wild.The most promising approach concerns the study of social insect networks [31].Details about ant interactions is probably the most feasible since recent techniques allow to tag and follow each individual separately [32].On the human side, we need to check whether the contrast is influenced by age.Since children perceive time differently from adults, following the contact patterns of young children in a nursery could provide a valuable insight into this question.

Figure 2 Figure 3
Figure 2 Distribution (p.t.e) of the contact duration on the four datasets (all days used).There is one entry for each contact of each pair of individuals so that both aspects are inter-mixed

Figure 4
Figure 4 Distributions (p.t.e) of the duration contrast obtained for all relations within the same group satisfying N int (r) > 50 in logarithmic (a) and linear scales (b).The complete datasets have been used (i.e all days).The interaction mean-time corresponds to a value of 1.We then see for instance that the probability for an interaction to last longer than its mean-time is around 30%, but, rarely, it can exceed 10 times the mean-time

Figure 5
Figure 5 Distributions (p.t.e) of the duration contrast obtained with the Monte-Carlo method described in the text to estimate numerically the statistical spread for each dataset.Each color represents a possible realization of the same dataset

Figure 6 1 Figure 7
Figure 6Distributions (p.t.e) of the duration contrast obtained for all relations in the conf dataset and simulations produced using the corresponding Fig.4distribution (see text for details).The dip at 1 comes from numerous cases (65%) where N int = 1 always leads to δ = 1

Figure 8
Figure 8 Distributions (p.t.e) of the contact duration contrast for each relation with at least 50 contacts.Each color represent a different distribution.The black line is the combined p.t.e shown in Fig. 4

Figure 9
Figure 9Distributions of the contrasts of the gap-time (inter-contact duration) on our datasets.To avoid the long night breaks, we show results for a single day

Fig. 9 .
Fig.9.The contrast of the inter-contact time thus seems to be more dependent on the sociological context.
. Mean strength s which represents the average total interaction time per individual