We now describe the datasets, the definition of community and the way to compare partitions in an empirical approach. Both datasets are localised on parts of Belgium. See Section SA.1 of Additional file 1 for a visualisation and description of the territory.

### 5.1 Twitter networks

Our first dataset is composed of 291,552 tweets geolocalised on the Belgian territory between 18,327 Twitter users, obtained as described in Additional file 1, Section SA.2. From this dataset we build a network \(N_{0}\) as follows. The nodes are the users, and the weighted edges count the number of reply-to tweets between the two users (without taking the directionality into account, in order to keep the graph undirected). Each node is associated to a position, obtained as the barycentre of positions of the user recorded in each sent tweet. In this way we see \(N_{0}\) as a network linking positions together. By the means of how the dataset was collected, those positions are spread over the Belgian territory.

A list of aggregated networks was created from \(N_{0}\). The territory of Belgium is divided into 589 municipalities, and used to be divided into 2,675 smaller municipalities until a merge took place in 1979. We first build two aggregated versions, where nodes represent former (\(N_{fm}\)) and current (\(N_{m}\)) municipalities, respectively, by merging all nodes of \(N_{0}\) positioned in the same (former or current) municipality. Edges are merged accordingly, receiving a weight that aggregates the weights of all corresponding edges of \(N_{0}\).

We also applied a regular grid of 125 m square cells onto the Belgian territory, and merged into a single node all nodes of \(N_{0}\) positioned in the same cell, creating the aggregating network \(N_{125}\). Increasingly coarser square grids of cell size 250 m to 32 km, were used in the same way to create the aggregated networks \(N_{250}\) to \(N_{32k}\) respectively. The number of nodes and edges are described in Table S1 of Additional file 1 (Section SA.3).

### 5.2 Phone networks

Our second dataset counts the numbers of phone calls between towers in the territory of Brabant, a former administrative unit (province) of 111 municipalities including and surrounding Brussels, the capital of Belgium. The derived undirected network, called \(M_{0}\), is composed of 1,168 nodes (towers). A weighted edge between two towers counts the number of communications between the towers in either direction, for a total of 13M communications over the network. As each tower is associated with a precise position, one may again consider \(M_{0}\) as a network between places. We may aggregate those places into municipalities, thus forming the network \(M_{m}\), or into cells of regular size 125 m to 32 km, creating the networks \(M_{125}\) to \(M_{32k}\), as for the Twitter dataset. See Table S2 of Additional file 1, Section SA.3, for the number of nodes and edges of each network.

### 5.3 Linearised stability maximisation

Communities are intuitively meant here as sets of strongly interconnected nodes with comparatively few connections between the communities. Among the many formalisations of this concept, one of the most popular is modularity [23], quantifying the goodness of a given partition \(\mathcal{C}\) of nodes as

$$ Q_{\mathcal{C}}=\frac{1}{2m} \sum_{C \in \mathcal{C}} \sum _{i,j \in C} \biggl(A_{ij} - \frac{k_{i} k_{j}}{2m} \biggr), $$

(7)

where *m* is the sum of all weights of the networks’ edges, and \(k_{i}\) represents the (weighted) degree of node *i*. \(A_{ij}\) is the weighted adjacency matrix of the network, and *C* (\(\in \mathcal{C}\)) represents a community of the partition.

We use a generalisation, called linearised partition stability [12], or equivalently Potts model [17], which introduces a resolution parameter *ρ* varying from 0 to ∞ as follows:

$$ r_{\mathrm{lin}}(\rho ,\mathcal{C}) = (1-\rho ) + \rho \frac{1}{2m} \sum _{C \in \mathcal{C}} \sum_{i,j \in C} \biggl( A_{ij} - \frac{1}{\rho } \frac{k_{i} k_{j}}{2m}\biggr), $$

(8)

At \(\rho =0\), single nodes are optimal as communities, while partitions with larger communities emerge for increasing values of *ρ*, until a single community is optimal at \(\rho \to \infty \). For \(\rho =1\), the linearised stability is the modularity, \(r_{\mathrm{lin}}(1,\mathcal{C})=Q_{\mathcal{C}}\). The resolution parameter *ρ* is hereafter called timescale, because linearised stability is formally derived in [12] as capturing the ability of incumbent communities to retain the flow of a diffusion of random walkers across the network for a timescale of the order of *ρ*. The original Potts model [17] uses the parameter \(\gamma = 1/\rho \). As most community detection criteria, linearised stability is NP-hard to optimise except for extreme values of *ρ*, and we use the Louvain method [24, 25] as a heuristic.

Whenever appropriate, we will use the linearised stability method to detect communities, because it is an edge-counting criterion, because it includes an extremely popular criterion (modularity, for \(\rho =1\)) as a special case, and because it allows adapting the timescale parameter *ρ* in order to create partitionings on different networks with the same or similar number of communities. There are certainly many methods of merits sharing the same properties. Our goal in the Results section is not to find the most sociologically relevant Twitter or phone call communities in Belgium, but illustrate how partitions found with an edge-counting criterion are modified in presence of aggregation. Therefore, the various arguments in favor or against the practical significance of the communities delivered by one or another method are not relevant here.

### 5.4 Normalised mutual information for comparing partitions

We compute the normalised mutual information [26], between the two partitions \(\mathcal{C}\) and \(\mathcal{D}\) of the same set of nodes, to evaluate how similar they are, as

$$ \operatorname{NMI}(\mathcal{C},\mathcal{D})= \frac{I(\mathcal{C};\mathcal{D})}{( H(\mathcal{C})+H(\mathcal{D}) )/2}, $$

(9)

where \(I(\mathcal{C};\mathcal{D})\) denotes the mutual information between the two partitions, i.e. between the set in \(\mathcal{C}\) and the set in \(\mathcal{D}\) containing a randomly picked node of the graph. Note that in this article, the sets of nodes belonging to a partition are either called ‘communities’ (if found by community detection algorithm) or ‘aggregation classes’ (if defining a way to aggregate the network).

Similarly, \(H(\mathcal{C})\) or \(H(\mathcal{D})\) denotes the Shannon entropy of each partition, i.e., the Shannon entropy of the set of a randomly picked node of the graph. The NMI takes values between 0, for independent (thus maximally dissimilar) partitions, and 1, for identical partitions.

In our case, we also want to be able to compare community partitions at different levels of aggregation, let us say for example the optimal partition \(\mathcal{C}\) and \(\mathcal{D}\) of networks \(N_{0}\) and \(N_{125}\), respectively. In this case, we lift the communities of \(N_{125}\) into communities of \(N_{0}\), replacing each node of \(N_{125}\) by its aggregation classes in \(N_{0}\). We call \(\mathcal{D}'\) this partition of the nodes of \(N_{0}\). We now compare the two partitions \(\mathcal{C}\) and \(\mathcal{D}'\) with the quantity \(\operatorname{NMI}(\mathcal{C},\mathcal{D}')\), which we will also sometimes denote \(\operatorname{NMI}(\mathcal{C},\mathcal{D})\) by abuse of notations.