For a given ego-alter pair we measure the inter-event time (‘gap’) between two successive calls in number of days irrespective of the directionality of the calls. Then we examine the variation in the duration of the succeeding call (in seconds) as a function of the length of the inter-event time. For analytical convenience, instead of considering individual calls we aggregate all calls in a given day and ignore the days in which the total calling time is less than 10 seconds. Thus by mentioning ‘call’ or an ‘event’ with respect to a given pair on a particular day we refer to the aggregated voice communication on that day. For a given ego *i* and its alter *j*, we construct the set of ordered pairs \(\{(\tau_{ij},T_{ij})\}\), such that, \(\tau_{ij}\) is a gap between two calls and \(T_{ij}\) is the duration of the succeeding call. For a given pair *ij*, we define \(\langle\tau \rangle_{ij}\) and \(\langle T\rangle_{ij}\) as the average gap and the average duration of calls, respectively.

We concentrate on the ego-alter pairs for which the communication is sufficiently spaced over time by considering the set of pairs, \(\mathcal {S}\), for which the maximum number of calls in any calendar month does not exceed 4 and there is at least one call in each of the 7 months. The bound on the maximum number of calls results in a characteristic gap of just over a week. Higher values of this bound would allow for more calls per week and would result in the inclusion of the more frequently contacted alters for a given ego [34] and for such alters the probability of finding large gaps in communication would be comparatively less (illustrated in Figure 1). At least one call per month allows us to focus on relationships which may be considered otherwise stable. Additionally, we consider only those pairs for which the distance between their most common location (\(d_{ij}\)) is greater than zero to reduce the likelihood of face-to-face interaction. The detailed criteria for selecting pairs is provided in the Materials and methods and the robustness of the results is discussed in Additional file 1.

First we show the probability distribution of the gaps and the call durations corresponding to pairs belonging to the set \(\mathcal{S}\). For comparison, we construct another set \(\mathcal{U}\) by relaxing the restriction on the maximum number of calls per month (other parameters being the same as for \(\mathcal{S}\)) to include the more frequently contacted alters, such that \(\mathcal{U}\supseteq\mathcal{S}\). We term \(\mathcal{S}\) as the set of ‘regular contacts’ and \(\mathcal{U}\) as the set of ‘all frequent contacts’. In Figure 1 we plot the probability distribution functions (PDFs) of \(\tau_{ij}\) and \(T_{ij}\) for the pairs in \(\mathcal{S}\) and \(\mathcal{U}\), irrespective of the age and gender of the individuals. In general, the PDFs for \(\tau_{ij}\) and \(T_{ij}\) are fat tailed. The PDF for \(\tau_{ij}\) shows peaks at multiples of seven days, which indicates a high propensity to make calls during weekends. The PDFs for the averages for individual pairs (\(\langle\tau\rangle_{ij}\) and \(\langle T\rangle_{ij}\)) show well defined peaks. Figure 1(a) and (b) shows that a typical average separation and a typical average call duration for alters in \(\mathcal{S}\) is around 12 days and 170 seconds, respectively. The PDF for the average gap in \(\mathcal{U}\) falls off exponentially and the typical average duration in \(\mathcal{U}\) is around 130 seconds.

In Figure 2 we plot the binned curves for the duration of the succeeding calls as a function of the gaps corresponding to communication between egos and the alters in the set \(\mathcal{S}\). Each curve corresponds to an age and sex cohort for the ego and the sex cohort for the alter. The curves indicate a logarithmic increase in the duration of calls with the increase in the gap. Although, the behaviour is found across all the cohorts considered, within the ego-age range of 25-60 year olds the trend appears to be particularly well defined. The cohorts within this range are shown in Figure 2. We use linear regression to fit the following: \(T_{ij}=\beta\log\tau _{ij}+\alpha\) to the data. The larger the *β*, the stronger is the dependence. The coefficient *α* provides a basal value for the duration of the calls. In Figure 3(a) we provide the values of *β* for the different ages and genders (filled symbols). Overall, the effect is strongest in the age range of 25-40 year olds and for same-sex pairs.

The dependence of \(T_{ij}\) on \(\tau_{ij}\) as reflected in Figure 2 results from accumulating the data from multiple sequences belonging to different ego-alter pairs. However, different pairs are expected to have their own idiosyncrasies, and as evident from Figure 1, the averages (gap and call duration) corresponding to different pairs follow unimodal distributions. Therefore, we first analyze the extent to which the properties of different pairs influence this dependence. For a set \(\{(\tau_{ij},T_{ij})\}\) belonging to a given pair *ij*, we construct an ensemble of artificial sets \(\{(\tau _{ij},T'_{ij})\}\), where, the \(T'_{ij}\)’s are obtained by randomly shuffling the original sequence of \(T_{ij}\)’s. In Figure 4(a) we illustrate the behaviour of the artificial data (black circles) for a particular case. The manufactured durations show a much weaker increase when compared to the original (red squares). We show the *β*’s resulting from the regression on the randomized data in Figure 3(a) (unfilled symbols). In general, the slopes for the randomized data are much lower, although different from zero. This comparison suggests that the correlations are truly present in the real data.

To extract the behaviour in a form that is independent of the characteristics of the ego-alter pairs, we scale the variables for a given pair with their corresponding averages. The dependence of the scaled variable \(T_{ij}/\langle T\rangle_{ij}\) on \(\tau_{ij}/\langle \tau\rangle_{ij}\) is shown in Figure 4(b) (red squares). The fact that the scaling extracts the correct nature of the correlations is evidenced when we scale the randomized data. The resulting curve (black circles) is flat and shows the absence of any correlation when the data are randomized. We employ a regression of the form: \(T_{ij}/\langle T\rangle_{ij}=\beta'\log (\tau_{ij}/\langle\tau \rangle_{ij} )+\alpha'\) for the scaled data. In Figure 3(b) we plot the slopes \(\beta'\). The figure clearly illustrates that the relationship between the scaled variables is qualitatively the same as that of the unscaled variables. Also, the scaled variables exhibit clear correlations, whereas the slopes for the randomized data are not different from zero. The relationship shows that for a given pair when the length of the gap is larger than the average gap, the duration of the successive call is larger than the duration of the average call. Conversely, if the gap is less than the average, then the duration also falls below the average.

To closely examine the nature of the ties we construct the distribution of alter age-ego age for the pairs in \(\mathcal{S}\). The distributions (Additional file 1) show that \(\mathcal{S}\) is predominantly constituted by alters having the same sex as that of the ego and falling in the same age cohort. In general, the above pattern holds up to the age of 50 year olds. For egos aged above 50, peaks appears at an age separated by one generation. The preferred alters (those called most often) are mainly age peers. Because these dyads were in different geographical locations, they are unlikely to be spouses (indeed, most are same sex peers) and are more likely to be either friends, siblings or distant similar age kin (e.g. cousins). Above the age of 50 years, the double peak suggests that, in addition to peers, egos invest heavily in alters that are about a generation younger, most likely either children or nephews/nieces.

We further categorized the pairs in the set \(\mathcal{S}\) based on the distance and the frequency of communication. First we divide the set into two groups, one consisting of pairs with distances (\(d_{ij}\)) smaller than \(d_{c}\) (‘close’) and the other with distances larger than \(d_{c}\) (‘distant’). We consider the average gap as a proxy for the frequency of calling. Therefore, we again split each of these groups into two subsets based on whether the average gap (\(\langle\tau\rangle _{ij}\)) is less than \(\tau_{c}\) (‘frequent’) or greater than \(\tau_{c}\) (‘infrequent’). We choose \(d_{c}=50~\mbox{km}\) which is larger than the spatial extension of the largest cities in the concerned country. Note that, in general, distance to alters is distributed according to an inverse power law [35]. Also, age and gender preferences of egos for their alters have been found to correlate with their geographic proximity [36]. We choose \(\tau_{c}=12\) days, which is the most probable value of \(\langle\tau\rangle_{ij}\) as can be seen in Figure 1. (See Additional file 1 for the joint PDF of average gap and distance).

In Figure 5 we provide the values of the coefficients *α*, *β* and \(\beta'\) for the fits to the data with the categorization as described in the previous paragraph. We obtain the coefficients with the data being further classified according to the gender of the individuals forming the pair. (See Additional file 1 for the coefficients when pairs are analyzed irrespective of the gender.) The plot clearly indicates an increase in *α* with the increase in distance. However, variation with average gap is not significant. The fact that females are involved in longer calls than the males is also evident. For the *β*’s we observe a variation with distance very similar to *α*. However, there is a marginal dependence on the average gap. These facts suggest that the reinforcement effect is stronger when the calling frequency is low and the distance of separation is large. The values of \(\beta'\) reflect the fact that the observation regarding *β*’s persist when the data is scaled. It also shows a strong gender homophily as the \(\beta'\)’s for same gender pairs appear to be larger compared to mixed gender pairs. This observation is consistent with the results shown in Figure 3(b).