Link transmission centrality in large-scale social networks

Understanding the importance of links in transmitting information in a network can provide ways to hinder or postpone ongoing dynamical phenomena like the spreading of epidemic or the diffusion of information. In this work, we propose a new measure based on stochastic diffusion processes, the \textit{transmission centrality}, that captures the importance of links by estimating the average number of nodes to whom they transfer information during a global spreading diffusion process. We propose a simple algorithmic solution to compute transmission centrality and to approximate it in very large networks at low computational cost. Finally we apply transmission centrality in the identification of weak ties in three large empirical social networks, showing that this metric outperforms other centrality measures in identifying links that drive spreading processes in a social network.


I. INTRODUCTION
The importance of nodes and links in networks is commonly measured through centrality measures. Their definitions generally rely on local and/or global structural information. Centrality measures using local information, like the node degree or link overlap, are computed efficiently as they only require knowledge about the neighbors of a given node or link. On the other hand, these measures cannot provide information on which nodes or links play global roles in the network structure. On the contrary, centrality measures based on global information about the network structure, like betweenness and closeness centrality [1,2], Katz centrality [3], k-shell index [4,5], subgraph centrality [6] and induced centrality measures [7] may better characterize the overall importance of a node or link. Unfortunately, although effective algorithms for approximating these quantities have recently been proposed [8,9], estimating these measures in large scale networks is still computationally challenging.
While global centrality measures have been very successful in identifying structurally important nodes or link in networks, it has been argued [10] that they do not evidently identify nodes or links with a key role in dynamical processes. Other centrality metrics, which directly use dynamical processes to assign importance were found to be more successful in this sense. The best examples are metrics based on random walkers like PageRank [11], eigenvector centrality [12], or accessibility [13]. Other examples are local metrics like the expected force [14], or percolation centrality [15]. These measures are based on random diffusion processes, but do not fully capture the basic mechanisms behind contagion mediated spreading phenomena. Here we define a new link centrality measure, transmission centrality, tailored to identify the role of nodes and links in controlling contagion phenomena. The transmission centrality measures the average number of nodes who are reached by the contagion process through each link during the spreading of a stochastic contagion process. This provides a direct measure of the centrality of the link in hindering or facilitating the contagion process. In the case of very large-scale network, we propose a heuristic calculation of transmission centrality, which is both computationally efficient and can be easily extended for weighted, directed, or temporal networks or even for nodes. Furthermore, to demonstrate the usefulness of transmission centrality we present a case study where we use this metric to identify weak ties [16,17] in social networks and characterize their role in contagion processes.
As it follows, after a brief discussion of related works and utilized datasets, we formally introduce transmission centrality and discuss a heuristic method for its approximate calculation. Then we discuss its properties and correlations with local centrality measures in three large-scale real world social networks. Finally, we present simulation results of SIR spreading processes to demonstrate the capacity of combined local measures and transmission centrality in designing effective strategies to enhance or hinder information diffusion in social networks.

II. RELATED WORKS
Node centralities have been widely studied, from classical static centralities like degree, closeness, betweenness, eigenvector [18] to centrality measures based on dynamical processes, such as random walk (e.g. PageRank [11]). Among these, betweenness centrality is one of the most popular measures as it quantifies the importance of a node by considering the global structure of a network instead of local information. Unfortunately, the efficiency of algorithms to calculate betweenness centrality is still challenging in the case of large-scale social networks as its best computation method has O(|V ||E|) complexity for unweighted networks and O(|V ||E| + |V | 2 log |V |) for weighted networks [8]. While many variants and approximation algorithms have been proposed to improve its algorithmic efficiency [19][20][21][22][23][24], researchers have also proposed alternative measures to quantify the importance of nodes in terms of dynamical processes on top of a network, such as K-path centrality [25] and percolation centrality [15]. K-path centrality [25] applies self-avoiding random walks of length k and counts the probability that a message originating from a given source traverses a node as its centrality. The percolation centrality [15] measures the relative importance of a node based on both network structure and its percolated states. Single-node-influence centrality and Shapley centrality assess the importance of a node in isolation and in a group respectively in social influence propagation processes [26]. [27] simulates epidemic models (SIS and SIR) to estimate node centralities on top of temporal social networks. Interestingly, this study shows that spreading processes fail to characterize the centrality measures like degree and core numbers of infected nodes. Dynamics-sensitive centrality [28], which counts the outbreak size in an epidemic model to quantify spreading influence of nodes, can better capture the importance of nodes particularly in epidemic spreading processes.
Most centrality algorithms have also been generalized to the estimation of link centrality measures, such as edge betweenness centrality, spanning edge betweenness centrality [29,30], and K-path edge centrality [31]. As node centralities aim to characterize the importance of nodes in a network, edge centralities provide quantitative perspectives to measure the importance of links in a network structure [32,33].

A. Network Data Descriptions
In the following study, we will discuss centrality algorithms by using three distinct sets of data recording communications between thousands or millions of individuals. For each dataset, first we aggregate the sequence of interactions to a static social network, excluding possible commercial communications. To do so, we only draw links between individuals who had at least one pair of mutual interactions during the observation period. In addition, to avoid leaf links we extract the k-core (k = 2) structure [34,35] of each network and use their largest connected component (LCC).
The first dataset we investigate is collected from the mobile phone call (MPC) communication sequences of 4, 256, 137 individuals during 4 weeks with 1 second resolution [36,37]. Individuals are anonymous users of a single operator with 20% market share in a European country. The static social network contains 5, 279, 169 mutual links. The final k-core (k = 2) structure of the LCC includes 1, 926, 787 nodes and 3, 269, 634 edges.
The second social network is aggregated from the sequence of wall posts of Facebook users (FB) [38][39][40]. The data records interactions from September 2004 to January 2009 between 31, 720 users connected by 80, 592 mutual links. The k-core (k = 2) structure of the LCC of this network contains 20, 244 nodes and 70, 132 edges.
The last social network, A Twitter conversation network (TW), is constructed from tweets from October 2010 to November 2013, which were collected through the Twitter Gardenhose [41]. We restrict our dataset to tweets with live GPS coordinates providing us over 420 million communication events, which represent a 1 − 2% of the entire volume. We construct a social network based on mutual conversational tweets (@mentions) between 4, 155, 700 users connected by 6, 506, 519 links. The k-core (k = 2) structure of the LCC of the Twitter conversation network consists of 966, 779 nodes linked by 2, 779, 524 edges.

B. Transmission centrality
Transmission centrality aims to measure for each link in a network its influence in disseminating some globally spreading information. More precisely it measures the number of nodes who received information during a diffusion process through a given link. Its definition intrinsically assumes a diffusion process to unfold on a network structure. In our definition we use the simplest possible information spreading process, the Susceptible-Infected model [42], u.state=S 6: u.asc=N IL if (v.state == S) then 15: if (rand() ≤ β) then 16: v.state = I 17: u.asc = u 18: however this can be replaced by any other diffusion process. The Susceptible-Infected (SI) process is defined on a connected network G = (V, E), where nodes u ∈ G.V can be in two mutually exclusive states, either susceptible (S) or infected (I). Initially each node is susceptible (S) except a randomly selected seed node, which is set to be in state I. In one iteration step each infected node can infect its susceptible neighbors with rate β until every node becomes infected in the network. Note that the parameter β here scales with the speed of information spreading, with value β = 1 corresponding to the fastest possible information diffusion process determining the shortest diffusion routes between the seed and any other node in the network. This diffusion process can be simulated with a modified breath-first-search algorithm [43] as written in Alg.1. There, during the unfolding of the diffusion we keep infected nodes with susceptible neighbors in a Q queue and record the branching tree G BT = (V BT , E BT ) of the process by keeping track of the direct ascendant of each node from which it received the information. Exploiting the structure of the actual branching tree, transmission centrality is formally defined as where |desc(i)| denotes the number of descendant nodes of node i in the branching tree of the actual spreading. The branching tree G BT , which is a subgraph of G, encodes the diffusion paths that the information takes to reach the vertices of the network. Using its structure we can easily measure the actual C tr value of each link by performing a second step of calculation based on the river-basin algorithm [44]. In practice, taking the initial seed s as the root of G BT , and starting from the leafs of the branching tree we can count the number of descendant nodes of each link, i.e., who received the information via the actual link. The algorithm is summarized in Alg.2, illustrated in Fig.1 and works as follows: First we define a dictionary C tr , which associates a counter to each link (i, j) ∈ G.E, that we set to zero initially (lines 1-3 in Alg.2). Then we recursively do the following for every node v ∈ G BT .V BT , which appears with degree deg(v) = 1 in G BT : (c) Remove v from G BT .V BT and e f from G BT .E BT (line 12 and 13 in Alg.2). The final transmission transmission centrality value of the actual link e f = (v, p) is stored in C((v, p)).
By repeating II.(a)-(c) recursively for each appearing leaf edge we assign a non-zero value for each link in the branching tree as it is demonstrated in Fig.1.c-f. The transmission centrality of a link can take values between 0 (for links, which are not in the branching tree) and (N − 1) (e.g. in the case the seed is propagating information via Output: C tr dictionary of transmission centrality values C tr ((u, v)) = 0 // set counter to zero for each link 4: end for a single link). Its actual value depends on the choice of the seed node and on the structure of the branching tree determined by the stochastic diffusion process. In this way centrality values of the same link may vary from one realization to another. To eliminate the effects of such fluctuations the final definition of transmission centrality of links is taken as the average centrality value for each link computed over processes initiated from every node in the network (for a algorithmic definition see Alg.3). Note that from now on C tr always assigns an average quantity if not stated otherwise.

A. Heuristic calculation of transmission centrality
One iteration to measure C tr performs with O(|E|) time complexity, in this case where we initiate its calculation from every node v ∈ V , its overall complexity is O(|V ||E|). It is however possible to define a heuristic estimate of transmission centrality at a considerably small computational cost. As the branching trees of different realizations may largely overlap, a relatively small number of independent realizations, initiated from a reduced set Output: C avr tr dictionary of average transmission centrality values 1: C avr tr = dict() 2: for (u, v) ∈ G.E do 3: C avr tr ((u, v)) = 0 // set counter to zero for each link 4: end for C act tr ←− T ransmissionCentrality(G, G BT ) 8: C avr tr ((u, v))+ = C act tr ((u, v)) // summing realisations 10: end for 11: end for C avr tr ((u, v)) = C avr tr ((u, v))/|G.V | // computing averages 14: end for ALGORITHM 3. Average transmission centrality of randomly selected seeds, could provide a good approximation to transmission centrality. Link transmission centrality initiated from a single node provides a locally biased measure as it assigns higher values to links closer to the actual seed. This bias is averaged out if we initiate the spreading process from every node in the network, but in case of a limited number of seeds it has residual effects. One way to eliminate this residual bias is by assigning zero centrality values to links connecting nodes closer than a distance d to the actual seed. The best value of d depends on the network; however this can be estimated by parameter scanning, as demonstrated in SI Fig. S1.
To illustrate the computation of the heuristic estimate, we use the FB network with 20, 244 nodes (for more details see Section III A) and compute the average transmission centrality for each link via the exact method by initiating an SI process from each node and the heuristic method where we initiate processes from 5000 random seeds (i.e. ∼ 25% of all nodes) and eliminate biases in distance d = 3 around each seed (for more on the selection of this value see SI Fig. S1). In Fig.2a we present a heat-map plot about the correlation between the exact (assigned as C o tr here) and the approximated (assigned as C tr ) centrality values of each link. It is evident that there is a strong correlation between the exact and approximated values of centralities, quantified by an r = 0.96 (p < 10 −6 ) Pearson correlation coefficient. Consequently, this unbiased sampling method can provide very close approximations to the exact transmission centrality values, while considerably reducing the computational cost (∼ 25% in this case). Note that this correlation analysis was not repeated for the other two empirical networks as the computation of the exact method would take extremely long on such large networks due to its computational complexity.
Subsequently, we applied the approximate method to compute transmission centrality in the MPC network (with 2000 seeds and d = 8) and TW network (with 5000 seeds and d = 7) as well. We consistently found that the average unbiased transmission centrality of links, measured in the three empirical systems, are broadly distributed (see in Fig.2bd respectively for the MPC, FB and TW networks) with power-law tails with exponents α = 3.08, 3.39 and 2.44 for the MPC, FB and TW networks respectively. This demonstrates the high variance of importance of links in transmitting information, which can be duly the consequence of the community rich structure of the three investigated social networks.
Transmission centrality can be generalized in various ways. First, it can be easily defined as a node centrality metric by counting for each node the number of their descendant nodes in the branching tree. Moreover it can be extended for directed and/or weighted networks by restricting the SI process to respect the direction of links during spreading or by scaling the transmission rate with the normalized weight of links. In addition, for an SI process one can explore central links in the case when the process does not diffuse along the shortest paths. By taking β < 1, longer spreading paths become plausible allowing the inference of links, which are central in any scenario. Transmission centrality can be easily defined for temporal networks [45] as well. Contrary to static networks, in temporal structures information can transmit between nodes only at the time of their interactions. As a result, information can travel only along time-respecting paths in the network, which drastically restricts the final outcome of any global contagion processes [46] and has evident consequences on the measured centrality values. Links, which appeared unimportant in the static structure may be central in the temporal network as they could lay on several time-respecting paths due to their specific interaction dynamics.
Finally, note that although transmission centrality is not equivalent, it naturally relates to the concept of betweenness centrality (and other centrality measures based on the counts of shortest paths between nodes). This relation is better explained in SI Fig. S2.

B. Case study: weak tie identification to control contagion processes in social networks
To demonstrate the potential of transmission centrality here we present a case study, where we use our new metric to identify ties in social networks in order to efficiently control contagion processes. Ties in social networks are associated with various strengths [47][48][49] and commonly categorized into two mutually exclusive groups: weak and strong ties. Following the terminology introduced by Granovetter [16,17], weak ties are maintained via sparse interactions, bridging between tightly connected communities to keep the network connected [47], and play an important role in disseminating information globally [36,[50][51][52][53][54][55][56]. On the other hand strong ties, sustained by frequent communications, are crucial in shaping the local connectivity of social networks, they are responsible for emerging clustered topology [54,55,57], and keeping information locally [36,51,52,56]. A precise measure of tie strength would allow the efficient differentiation among these types and to identify weak ties in social networks in order to control globally spreading contagion processes.
capturing the fraction of common friends in the neighborhood of connected nodes i and j [16,47,50]. Here, k i and k j assign the degree of node i and j respectively, and n ij is the number of their common neighbors. Weak ties are associated with small overlap values, while the contrary is not always true. Leaf links, structural holes, or merely the fact that networks are sparse may induce links with small overlap, which leads to some ambiguity when identifying weak ties in this way.  Another way to assign the strength of social ties is via the intensity of dyadic communication [42,50,58]. It can be measured as the frequency, total duration, or the absolute number of interactions between connected peers. In this study, assuming discrete communication events, we define dyadic tie strength as the number of interactions between individuals i and j as where the sum runs over the observation period T . δ(t, i, j) = 1 if an event appears between i and j at time t regardless of its direction, otherwise it is 0 [50]. Dyadic tie strength may capture mutual commitment or emotional closeness between people; however, as a local measure, it is subjective to individual characteristics like communication capacity or the egocentric network size. In this way, it is unable to indicate the role of a link in the global structure in the context of the emergence of any collective phenomena. In addition its broadly distributed values prohibit an evident categorization of social ties. As shown in Fig.3d-f and in other studies [47,50]), dyadic tie strength and link overlap are positively correlated in accordance with Granovetter's theorem [16]. At the same time, transmission centrality and overlap show strong negative correlations (see Fig.3.a-c) as weak links, with small overlap values, are commonly situated between communities, and thus transmitting information to a large set of nodes. More interestingly, dyadic tie strength and transmission centrality values do not show strong correlations (see in Fig.3.g-i). Although both are correlated with link overlap, they capture notably different and seemingly independent features of social ties. For the precise Pearson correlation coefficients (and p-values) see Table I. While overlap has been shown to identify weak ties efficiently [47,50], this measure has a major limitation. It assigns a zero overlap vale for an unrealistically large fraction of links including weak ties but also leaf links, links surrounded by structural holes, or links situated at sparsely connected parts of the network. It is indeed true in the investigated systems where 48.2%, 49.8%, and 45.2% of social ties appear with O = 0 (resp. in the MPC, FB, TW networks). Relying merely on the link overlap one cannot differentiate between these links, thus they are treated equivalently. On the other hand, the Granovetterian criteria suggest that weak ties are not only characterized by small overlap, but they also exhibit small dyadic tie strengths, and high transmission centrality. Based on these conditions we design two combined strategies where we differentiate between zero overlap links using their  Table I).
w or C tr values. We first rank ties in an increasing order of overlap, and then sort again links of the same overlap value increasingly by their dyadic tie strength (assigned as (O, w)), or by their inverse transmission centrality values (assigned as (O, C −1 tr )).

Controlled SIR spreading.
The precise identification of the weakest weak ties is important, because by suppressing interactions on this limited set of links, we may effectively control globally spreading processes in the network. To model such scenarios we take a network structure and introduce a weight ω ij for each link (with values defined later). To select the weakest links to control, we consider one of the two candidate sorting strategies, (O, w) or (O, C −1 tr ). After sorting links by one of these metrics, we select the f weakest fraction of them to control by linearly rescaling their weights as Ω ij = δω ij , with the parameter 0 ≤ δ ≤ 1. In this way, we weaken interactions on the selected ties, and such that we can exert further control on dynamical processes, like the Susceptible-Infected-Removed (SIR) model. The SIR process [42] is a well known model of epidemics and rumor spreading [59,60] and it is defined on a network where nodes can be in exclusive states of susceptible (S), infected (I), or recovered (R) [42]. At each iteration connected nodes are updated as S + I β → 2I, or I µ → R with β and µ being the infection and recovery rates respectively. In this scenario, we fix µ = 0.1 and β = 0.25, and re-scale the transmission probability for each controlled link as β ij = Ω ij β (for a sensitivity analysis regarding this choice see SI Fig. S3). After initiating the process from a randomly selected seed we simulate it until full recovery and monitor R, the number of recovered nodes giving the maximum number of nodes ever got infected during the process.
In our first experiment we assign ω i,j = 1 for each link assuming that the network is unweighted at the outset. To study the effects of link control, after sorting links by (O, C −1 tr ) or (O, w), we choose the weakest 12%, 24%, 36%, or 48% of links (see Fig.4.a, b, and c). In addition, as a reference we use a network where the same fraction of randomly selected links are controlled in the same way, i.e., by re-scaling their weights with δ. Finally to quantify the effects of increasing control, we measure the Φ C −1 tr ,r (δ) = R O,C −1 tr (δ)/R rand (δ), and Φ w,r (δ) = R O,w (δ)/R rand (δ) ratios of recovered nodes in scenarios of targeted and random control strategies for various δ values. If the targeted strategy performs performs comparable to the random one, these ratios are equal to one; otherwise the stronger control a targeted strategy enforces, the smaller the corresponding ratio becomes.
When we set δ = 1 the ratios of endemic population size are trivially one as no control is applied (see Fig.4.a, b, and c). However by decreasing δ, thus by increasing control, large differences appear between the targeted and random cases. Effects are stronger when a larger fraction of weakest links are re-scaled with smaller and smaller δ factor. The differences between the (O, C −1 tr ) (solid lines) and (O, w) (dashed lines) strategies are maximal when we control an intermediate 24% or 36% of links, while they perform similarly when the controlled fraction is small (12%) or large (48%). It is also evident that the (O, C −1 tr ) strategy outperforms the (O, w) and provides remarkably better control in reducing the final infected population, specially for smaller δ values.
To bring our experiments closer to reality we repeat our measurements on weighted networks where we define link weights as ω ij = w ij / w , i.e. the number of interactions between nodes i and j normalized by the w average number of interactions per link calculated over the whole network. In the case where ω ij > w we set the corresponding weight ω ij = 1.0. This choice is necessary as weights are heterogeneously distributed in this case, and thus severely slow down the simulated spreading to reach full prevalence. On the other hand, since controlled links with small overlap values tend to have small weights, negligible effect of this approximation is expected. The different control strategies qualitatively provide the same results on the weighted FB and TW networks (Fig.4.e, f); however, their effects are considerably stronger on the MPC structure (Fig.4.d). There, the (O, C −1 tr ) strategy appears to be the more efficient even after controlling only the 12% of the ties. Moreover, this strategy can lead to 90% reduction of the infected population in the case of re-scaling 36% of links with δ = 0.01. Note that the observed differences between different strategies cannot be the result of the limited communication on zero overlap links only, as we observed qualitatively the same effects in weighted and unweighted networks.
To directly highlight the differences between the targeted strategies we further investigate the strongest controlled case. We set δ = 0.01 and repeat our experiments by controlling various f fractions of links to measure the Φ C −1 tr ,w (f ) = R O,C −1 tr (δ)/R O,w (δ) fraction of endemic recovered population sizes, i.e., the ratio of the two performance functions. Results in Fig.5.a, b, and c evidently show that the (O, C −1 tr ) strategy almost always outperforms the (O, w) strategy, especially when we consider weights. In addition, the minimum points of the Φ C −1 tr ,w (f ) curves in Fig.5 assign the best pay-off between the controlled f fraction of links and the effectiveness of contamination control using the (O, C −1 tr ) strategy. This minimum point indicates that ∼ 30% of the weakest ties are enough to control and mostly efficiently hinder the spreading processes on the investigated social networks.

V. DISCUSSION
In this study we introduced a new link centrality measure, called transmission centrality, which sensitively quantifies the importance of links in global diffusion processes. We defined an algorithm to compute transmission centrality, demonstrated on three large-scale networks its general properties, and discussed possible ways of how this measure can be generalized for directed, weighted or temporal networks or even as a node centrality measure. Finally in a case study, we showed that the combined information of overlap and transmission centrality serves as the best way to identify weak links to gain maximum control of spreading processes. Although here we demonstrated the effectiveness of transmission centrality in identifying weak ties in social networks specifically, the same metric can be applied in any other type of networks to identify links with specific structural role and importance in controlling the emergence of various collective phenomena.
We discussed that the main limitation of this new centrality measure is rooted in its computational complexity, which scales as the best known algorithm for betweenness centrality. However, we proposed a way around this limitation by defining a heuristic method to approximate transmission centrality values in very large networks at a considerably cheaper cost.
Several extensions of this method are possible by considering other probing processes other than SI, or arbitrary weight definitions, directed links, temporal interactions, or node transmission centrality. Furthermore, several straightforward applications can be foreseen. Examples are in viral marketing, rumor contamination, or intervention designs; their identification can be the subject of future studies. Our aim here is to ground a new metric of link centrality and to contribute to the design of effective methods to identify ties, which play an indisputably important role in the structure and dynamics of social networks.

S1 Local bias of link transmission centrality
In the main text, we have discussed that link transmission centrality is a locally biased measure as it assigns higher values to links, which are closer to the seed node. To understand this bias on the "close-to-seed" links, we directly analyze the actual branching trees with root as the actual seed node. We compute the transmission centrality C tr of links as the function of their distance d from the actual root. Here the distance of a link is defined as d = min( b , e ), i. e., the minimum of the shortest paths b or e of the beginning and ending nodes (respectively) of the actual link.
One can assume two different characters for links in the vicinity of the seed node. First, if the seed has low degree, the corresponding C tr values of immediate links will be larger compared to the case where the seed has many neighbors. Second, we expect that this bias decreases by distance d measured from the seed. These two effects can be identified from the scaling of the C tr (d) functions in Fig.S1.a-c. Here for each network we select randomly seeds from different degree groups (100 seeds for each group) and measure the C tr average transmission centrality (Fig.S1.a-c) in distance smaller than or equal to d relative to the actual seed.
These results can help us estimate the induced bias and select an appropriate threshold d max to eliminate its effect. This choice has to consider two competing factors: the choice of a d max , which is large enough that the local bias of the seed becomes negligible; and to choose a distance small enough not to remove too many links from the tree. The E number of links in distance d from the seed is exponentially increasing as shown in (Fig.S1.d-f). To remove the actual bias we set C tr = 0 for those links, which are within a determined distance d max from the actual seed in the network. Based on Fig.S1 we choose d max = 8, d max = 3, and d max = 7, for the MPC, FB and TW datasets respectively. These choices fulfill both conditions as they are large enough to reduce the bias considerably, while at the same time exclude only the 4.3%, 1.1% and 3.7% of links for MPC, FB and TW datasets respectively. Naturally, removing links may decrease the overall heterogeneity of C tr . However, as demonstrated in Fig.S1.g-i, even after excluding the biased links, the distribution P(C tr ) remains fat tailed.

S2 Relation to betweenness centrality
Transmission centrality as a measure can be easily associated to the concept of link betweenness centrality commonly defined as C b (i, j) = s =i =j =t g s,t (i, j)/g s,t , where g s,t assigns the number of shortest paths between nodes s and t while g s,t (i, j) is the number of them, which goes through the link (i, j). the definition of C b and C tr are not equivalent they are strongly related. Their differences are rooted in the deterministic definition of C b , which considers all shortest paths between pairs of nodes. On the other hand, C tr is defined by an SI process, which is stochastic even in case of β = 1, as it considers in a random order the possible links of an infected node along which to transmit the infection to susceptible neighbours. In this way it may never explore all possible shortest paths but would give credit to the most plausible ones over several attempts of realizations. Despite these fundamental differences they both capture similar quantities fractional to the number of shortest paths running through a given link. Due to these underlying similarities Link transmission centrality in large-scale social networks they appear to be closely correlated in the case of all investigated empirical structures as seen in Fig.S2. These strong correlations demonstrate the relationship between transmission centrality and betweenness centrality, with the advantage that the approximate calculation of the former scales considerably better with system size. As discussed earlier, the exact computation of transmission centrality scales with O(|V||E|) complexity, which is equal to the best known algorithm [?] to measure C b . On the other hand, one can reduce radically the computational cost by considering a relatively small number of seeds to compute average C tr and to obtain surprisingly good approximations for link betweenness values. To demonstrate this scaling, for all three empirical networks we measured the correlations between C b and C tr while successively increasing the number of seeds for the latter one. Results summarized in Table S1 show that the average transmission centrality, computed from the 0.1% of the MPC network nodes (10% in case of FB and 0.2% for TW ), approximates well the actual betweenness centrality values, with correlations R MPC 0.83 (resp. R FB 0.96 and R TW 0.87). This demonstrates yet again the close relationship between these metrics and the success of the provided approximation method of transmission centrality.

S3 Parameter dependence of controlled SIR spreading
In the main text we argued that the combined metrics of overlap-transmission centrality (O, C −1 tr ) is the most efficient metric to hinder epidemics outbreaks modeled by an SIR process. However, all presented simulation results used a single parameter set with a constant basic reproduction number of R 0 = β/µ = 2.5. Here, to demonstrate that our simulation results were mostly independent of the choice of R 0 , we fixed µ = 0.1 and repeated our experiments for different values of β. We selected an f fraction of links by the Link transmission centrality in large-scale social networks Table S1: The r Pearson correlation coefficients measured between the betweenness centrality and transmission centrality values of the same links with increasing number of seeds in the three empirical network. Each corresponding p-value is smaller than 0.001. (O, C −1 tr ) strategy or randomly, and re-scaled their weight by δ = 0.01. We measured the average final size of the recovered population through 100 realizations for the targeted and random strategies. Depicting the ratio Φ C −1 tr ,r (f) = R O,C −1 tr (f)/R random (f) of the corresponding measures, in Fig.S3, we show that the effects of targeted control increases as we control a higher fraction of links (just as we have seen in the main text); however this behaviour depends only weakly on the choice of β. This suggests that the observed behaviour is qualitatively the same for a wide range of R 0 of the SIR model, thus not a consequence of specific parametrization of the spreading process.

No
Link transmission centrality in large-scale social networks