2.1 Data description
Our dataset consists of 24 consecutive days (18 weekdays and 6 weekend days) of Call Detail Records (CDRs) data, inclusive of 11,4B outgoing mobile calls of TIM, one of the major Italian telecommunication companies (30.8% of market share in Italy^{Footnote 1}).
CDRs are collected for billing purposes by mobile network operators: more specifically, a CDR record of the user is created every time a phone interacts with the network, recording (i) the type of the event (incoming/outgoing call, transmission of a text message, consumption of a certain amount of data traffic), (ii) the pseudonym of the users involved (the one producing traffic and, eventually, e.g., in case of voice traffic, the other party involved), (iii) the timestamp of the event, and (iv) the cell network’s antenna accessed for the event (i.e., to which the caller’s phone was connected), that, to a wider extent, represents the location of the user [12, 13].
The CDRs of our dataset are limited to voice traffic and have been provided by TIM after some preprocessing steps. First of all, CDRs have been enriched with demographic data from the Customer Relations Management, in order to be able to represent users in terms of gender and age ranges. CDRs have then been filtered at 99% percentile on number of daily calls per user, in order to remove edge cases that are not representative of the general population (e.g., call centers). In particular, if the number of calls for a user during a day exceeds the threshold, all the CDRs associated with that user for that day are removed from the dataset. Finally, data have been aggregated by city, hour, gender and agerange, getting rid of the identities (even if already pseudoanonimized) of users. Thus, for each city and hour, the dataset contains: (i) the number of outgoing calls divided by gender, (ii) the number of outgoing calls divided by age range, and (iii) the total number of outgoing calls.
Regarding the identification of our cities, we have adopted the definition developed, in 2012, jointly by the European Commission and the Organization for Economic Cooperation and Development (OECD) [14]: a city is a local administrative unit (LAU) where the majority of the population lives in an urban centre of at least 50 000 inhabitants. The definition provides also a division of European cities into 6 size classes: S, M, L, XL, XXL and Global City. We have considered 76 Italian cities that fall into the OECD definition and group them in Small (S), Medium (M), Large (L, XL, XXL) ones. Notice that no city in Italy can be categorized, according to OECD definition, as Global City, since no Italian city has more than 5 million inhabitants.
Hence, if we define \(\mathit{calls}_{h}(c,d)\) as the number of calls for a city c, during a day d and an hour h, the timeseries of the calls, or city’s activity pattern (A), is a timeseries of the values \(A_{h}(c,d)\) where
$$ A_{h}(c,d) = \frac{\mathit{calls}_{h}(c,d)}{\sum_{h \in[0,23]} \mathit{calls} _{h}(c,d)}. $$
It is worth highlighting that we are considering the percentage of calls over the day for each hour and city. Thus, we can compare different cities independently of the absolute number of outgoing calls.
Finally, we identify the following socioeconomic indicators to investigate the economic role (i.e., city’s wealth), the attractiveness for foreigners (e.g., immigrants), and the incoming and outgoing commuting patterns of the highly synchronized Italian cities:

Resident population: The absolute number of the resident population in a city.^{Footnote 2}

Foreign population: The absolute number of the foreignborn population in a city.^{b}

Population density: The ratio between the resident population and the city surface^{2}.

Foreign percentage: The percentage of the foreign population over the resident population for a city^{2}.

Average income: The average yearly income per tax payer^{2}.

Inout commuters ratio: The ratio between commuters moving to a city X for work or study reasons and commuters moving from a city X for work or study reasons.^{Footnote 3}

Incoming commuter ratio: The ratio between commuters moving to a city X for work or study reasons and the resident population of that city.^{3}

Outgoing commuter ratio: The ratio between commuters moving out from a city X for work or study reasons and the resident population of that city.^{3}
2.2 Dynamic time warping
In order to compute the synchronization between the activity patterns of each pair of our cities, we have used the Dynamic Time Warping (DTW) distance algorithm [15]. DTW has been extensively adopted in speech recognition [16], computer vision [17, 18], natural language processing [19, 20], and image matching and handwritten recognition [21] as a measure of similarity between timeseries. The algorithm provides an estimate of the optimal match between two timeseries, including possible compression, expansion or lags in sections of the sequences. For example, DTW can capture similarities in walking activities, even if an individual is walking faster than the other. Thus, DTW can remove the lag due to the circadian rhythms characterizing our timeseries [22, 23]. For this reason, it provides a more correct notion of similarity between cities’ activity patterns than an approach based on slidingwindow correlation [24].
More specifically, assuming two timeseries \(X=(x_{1}, \ldots, x_{M})\) and \(Y=(y_{1}, \ldots, y_{N})\) a DTW path \(P=(p_{1},\ldots, p_{K})\) is a sequence of tuples of indices where \(p_{k}=(m_{k}, n_{k}) \in[1, \ldots,M] \times[1,\ldots,N]\) is subject to the following constraints:

1.
\(p_{1} = (1,1)\) and \(p_{K}=(M,N)\)

2.
\(m_{1} \leq m_{2} \leq\cdots m_{K}\) and \(n_{1} \leq n_{2} \leq \cdots n_{K}\)

3.
\(p_{k+1}  p_{k} \in\{(1,0), (0,1), (1,1) \}\) for \(k\in[1:K1]\).
Given a distance function d (e.g., Euclidean distance), the cost of a path \(c_{p}\) is defined as \(c_{p}(X,Y)=\sum_{k=1}^{K}d(x_{m_{k}}, y _{n_{k}})\). The DTW distance between X and Y is hence defined as the cost of the wrapping path \(p^{\star}\) having minimal total cost among all the possible wrapping paths.
By considering the activity pattern timeseries associated with the activity level of a city, we have obtained the DTW distance between the timeseries of all the couples of cities for a given day. Hence, the higher the DTW distance between a couple of cities, the lower the synchronization of their activity pattern timeseries. Moreover, we have computed the mean and variance of the DTW distances, during weekdays and weekends, for each couple of cities. Mean and variance are estimated by using the jackknife resampling procedure [25].
In order to investigate the association between the DTW distances and the socioeconomic indicators listed in Sect. 2.1, for each city we have considered the average of the means previously computed using the jacknife resampling method. Then, we have computed the varianceweighted average of the DTW distances for each city by using the inversevariance weighting procedure [26]. This method permits aggregation of two or more random variables (i.e., DTW distances) to minimize the variance of the weighted average.
Finally, for each city the varianceweighted average of the DTW distances is associated to each of the socioeconomic indicators by means of Spearman bivariate correlations. The Spearman bivariate correlation measures the strength and direction of the association between two variables. Specifically, the Spearman coefficient is a number between −1 and +1, where −1 means perfect negative correlation, +1 indicates perfect positive correlation and 0 indicates no correlation.
2.3 Bootstrap procedure
To obtain an accurate estimation of the variance of the parameters of our fits, we performed a bootstrap procedure. Bootstrap resampling method is a widely used technique to infer properties of an estimator by sampling the original data repeatedly. As such, we have performed a group bootstrap by extracting a city and adding to our bootstrap sample all the couples containing the extracted city. This procedure guarantees to preserve, at each bootstrap iteration, all the correlations that a city has with other cities, since no couples that include the selected city are left out. Our bootstrap procedure follows three steps:

(i)
For each group of n cities of the same size (Large, Medium and Small) extract n cities with replacement;

(ii)
Create the dataset with couples of cities for the bootstrap iteration using all the possible combinations of extracted cities (excluding the couples with the same city);

(iii)
Perform a Weighted LeastSquare Regression (WLS) using as weights the variance previously computed using the jackknife sampling method.
For each bootstrap iteration we implement a Weighted LeastSquare Regression and collect the values of the slope m and the intercept q of the fit. Finally, obtained results were evaluated by performing a Ttest to asses whether the slope differs from zero.