Effective strategies for targeted attacks to the network of Cosa Nostra affiliates

Network dismantling has recently gained interest in the fields of intelligence agencies, anti-corruption analysts and criminal investigators due to its efficiency in disrupting the activity of malicious agents. Here, we apply this approach to detect effective strategies for targeted attacks to Cosa Nostra by analysing the collaboration network of affiliates that participate to the same crimes. We preliminarily detect statistically significant homophily patterns induced by being member of the same mafia syndicate. We also find that links between members belonging to different mafia syndicates play a crucial role in connecting the network into a unique component, confirming the relevance of weak ties. Inspired by this result we investigate the resilience properties of the network under random and targeted attacks with a percolation based toy model. Random removal of nodes results to be quite inefficient in dismantling the network. Conversely, targeted attacks where nodes are removed according to ranked network centralities are significantly more effective. A strategy based on a removal of nodes that takes into account how much a member collaborates with different mafia syndicates has an efficiency similar to the one where nodes are removed according to their degree. The advantage of such a strategy is that it does not require a complete knowledge of the underlying network to be operationally effective.


Introduction
Understanding the networked structure of criminal organizations is paramount to reduce their impact on society. In the last two decades a growing body of literature based on network science applications [1] has shed light on the characteristics of different criminal enterprises, ranging from terrorist activity [2][3][4] to organized crime [5][6][7][8][9][10][11] and corruption cartels [12][13][14]. The main advantages of a network approach are (i) gaining a better picture of the underlying operational structure of such systems, which is usually hidden and strongly secretive and (ii) understanding the dynamical behavior of these malicious agents in order to build effective strategies to limit their impact. In the case of Cosa Nostra, previous studies have revealed that contact networks obtained through conventional investigation strategies are not always efficient in mapping the actual functional structure of the To this extent, we preliminarily investigate the homophily [30] properties of the network and find that the membership to a mafia syndicate induces statistically significant homophily. However, the fact that the network shows a unique connected component is due in a statistically significant way to the links that connect affiliates belonging to different mafia syndicates. Despite homophily, this result again calls for a deeper investigation of the links between different mafia syndicates. Such contraposition between homophily and heterogeneous syndicate affiliation mimics the classical contraposition between strong ties and weak ties [31]: links amongst mafia affiliates belonging to different syndicates are weak. However, they are crucial in maintaining the network connected.
As a next step, we investigate the resilience properties of the network to random and targeted attacks. We find that random removal of network nodes results to be quite uneffective in dismantling the network. Conversely, targeted attacks where nodes are removed according to the highest values of betweenness and degree are performing well on our data. These results are in agreement with classical results [17,19] in the field of network resilience. However, inspired by the preliminary results on homophily, we tried a dismantling strategy where nodes are removed from the network according to their number of connections with affiliates belonging to other syndicates. What is important here is not the mere degree, i.e. the number of connections with other mafia affiliates, but the fact that the other mafia affiliates are distributed in different mafia syndicates. It turns out that also such a strategy is effective and comparable with a degree-based dismantling strategy. From an operational point of view, the latter strategy is extremely relevant, as the information about the syndicate affiliation is usually easier to get for the law enforcement agencies.
The manuscript is organized as follows: in Sect. 2 we illustrate the data used in the present investigation. In Sect. 3, we introduce the collaboration network of mafia affiliates and analyze its homophily properties. In Sect. 4, we investigate the resilience properties of the network both to random and targeted attacks. Finally, in Sect. 5, we draw our conclusions.

Data
For our analysis we aim to construct a network whose nodes are Cosa Nostra members and whose weighted links represent the number of joint tracked crimes. Moreover, as an additional node attribute we aim to add the syndicate to which each member is affiliated and the area he/she is resident in. In order to construct such a network, we started from a set of 125 judgements issued by the Judiciary of Palermo (judge for preliminary investigations) in the interval 2000-2014. For confidentiality reasons, the list of judgements is available only upon request. From the judgements we extracted 976 subjects, for each of whom we obtained name, date and place of birth. Such information allowed us to reconstruct the fiscal code (equivalent to the social security number in other countries) of all subjects. Other information, such as the name of judges, public prosecutors and of defense counsel, was not used in the present study.
From the fiscal code we obtained the penal records (Certificati Penali) of all the 976 subjects, by means of the facility CERPA [32], which is a software that allows to query a database containing the penal records. Amongst all queried fiscal codes, 631 were associated to subjects that could be used in the analysis. The remaining ones were related to subjects who were either over 80 years old, or already dead or who had never been convicted. Indeed, although the information about the subjects has been extracted from judgements issued by the Tribunal of Palermo, such judgements are not definitive. Thus, it is possible that some of subjects that are condemned in a judgement are discharged by any conviction in some later stage of trial. By using the penal records we are thus focusing on Cosa Nostra affiliates whose criminal activity has been formally and definitely recognised by the Italian state. In statistical terms, we are minimizing the false positive links maybe at the expense of some false negative one.
The penal record of each subject contains information on his/her convictions. Such information consists in: • date when the judgement has been emitted, • court that emitted the sentence, • date when the conviction become definitive, • the crime committed according to the Italian penal code, the Code of Criminal Procedure, Road Code, Special Legislation,. . . • the judgement details (duration of imprisonment, whether there are fines and/or other forms of economic penalties,. . . ) Although in principle the penal records report also the time when the crime was committed, we did not include this information in our analysis due to its inconsistence and only partial presence in the data. Using the penal records we were able to generate a bipartite network of 631 subjects condemned for Mafia crimes, 723 committed crimes and 5408 links between subjects and crimes. In order to have a better statistics we aggregated all the crimes that made reference to different commas of the same article of the penal code. By projecting the bipartite network on the set of Cosa Nostra members, we obtained a network G(N, L) of N = 631 nodes and L = 1265 links. In this network we therefore have a link between two mafia affiliates whenever they have been sentenced in the same trial for committing the same crime.
In order to add information on the syndicates we had access to data provided by the Prosecutor Office. The information is updated until October 2018. According to this information a set of 418 out of 631 subjects condemned for Mafia crimes were divided into 62 mafia syndicates that are essentially based in different neighbourhoods of the Palermo municipality as well as in a few villages close to Palermo. After projecting the subset of the original bipartite limited to the 418 subjects, we obtained a network G synd (N synd , L synd ) with N synd = 418 nodes and L synd = 857 links.
Furthermore, we obtained the full residence address of a subset of subjects (350 out of 632) from the Registry Office of the Palermo municipality. The residence addresses of the remaining 277 subjects were not available due to the fact that such subjects were not residents in Palermo. We then extracted the postal code from the residence addresses and added it as a node attribute, obtaining a network G pc (N pc , L pc ) with N pc = 350 and L pc = 399 links. This allowed us to study the impact of geographical co-location in shaping criminal collaborations.
When analysing dismantling strategies for both the syndicate and the postal code networks, we looked only at their largest connected components, in order to study the effectiveness of the different strategies in breaking them into smaller pieces. Thus we focused on the subnetworks G lcc synd (N lcc synd , L lcc synd ) with N lcc synd = 193, L lcc synd = 751 and G lcc pc (N lcc pc , L lcc pc ) with N lcc pc = 64, L lcc pc = 253. All the activities that requested the direct knowledge of the subjects identity were performed by personnel of the Palermo Prosecutor's Office. We wrote the scripts to extract the relevant information from judgements and penal records and they ran them. We only had access to anonymized data.

The networks of Cosa Nostra affiliates
In this section we analyse the two networks G synd and G pc in order to characterize the role of syndicates and geography in the shaping of criminal collaborations. As both network contain disconnected nodes, i.e. mafia members that were convicted but were committing crimes alone, we restricted our analysis to the networks formed by mafia member that have at least one link to others. Thus G synd (N synd , L synd ) is reduced to N synd = 276 mafia affiliates with L synd = 857 links and G pc (N pc , L pc ) is reduced to N pc = 159 mafia affiliates with L pc = 399 links.

Homophily induced by syndicate affiliation
Let us consider the network G synd described above.
First, we check if there is a homophily effect induced by mafia syndicates. Homophily in this context implies that a link is more likely to be observed between two members of the same mafia syndicate than between two mafia members coming from different mafia syndicates. Out of the L synd = 857 links in our networks, 198 are between members of the same mafia syndicate (23%). In order to test if this percentage is significantly higher (presence of homophily) or lower (avoidance of homophily) than what would happen if connection were random, we create 10,000 replicas of the network with a configuration model that preserves the degree of each node while randomizing its connections, and count for each replica the number of links between members of the same mafia syndicate. The results are shown in Fig. 1. The links between same mafia syndicates in the real network are the green dashed vertical line, while the distribution of values in the random replicas are the blue bars. The evidence of a homophily pattern is statistically significant, with the p-value associated to the z-score being approximately null, i.e. smaller than the numerical precision achievable by our computers. As we assume the normality of the distribution of random replicas when extracting the p-value from the z-score, we add a Gaussian fit to show that the assumption is verified.

Network connectivity and syndicate affiliation
In order to test the relevance of collaborations across mafia syndicates in making the system connected, we perform the following analysis. We start from the links between members of the same mafia syndicate, that are 198 as illustrated above. If we keep all these links in the network and remove the ones between members of different mafia syndicates, we obtain a network divided into 161 components (the original network has 28 components). In order to test whether the disruption induced by removing the links between members of different syndicates is significantly strong (or weak), we keep the same number of randomly selected links (and remove all the others) and count how many components we have in the resulting network. As this procedure is random, we repeat it 10,000 times to have an ensemble of results. In Fig. 2 we plot the number of components obtained after removing links from different mafia syndicates (green dashed vertical line) with the distribution of results obtained removing the same number of randomly selected links (blue bars). The result is that links between mafia syndicates are contributing significantly more to keep the system connected than what is expected by chance (p-value of the associated  The green dashed vertical line shows the number of connected components obtained after removing links between affiliates to different mafia syndicates in the original network. The blue bars show the the distribution of the same quantity obtained removing the same number of randomly selected links. The distribution is obtained over 10,000 iterations z-score on the number of components is 1. 2 10 -11 ). This evidence have a strong analogy with the weak ties theory [31]: members of different syndicates can be seen are weakly connected, as Cosa Nostra syndicates are socially closed circles. However, these weak ties have a relevant role in making the whole system connected. Indeed, we will see in Sect. 4 that prioritizing the removal of links between different mafia syndicates is an effective strategy to disrupt the network of collaborations.

The role of postal code
Let us consider the network G pc described above. We now want to repeat the same analysis as above in order to test the role of the place where the mafia affiliates live. We map geographical residence through postal codes.
We first focus on homophily. Specifically, we count the number of links between two mafia affiliates with the same postal code. Out of the L pc = 399 links there are 63 such links (16%). In order to test if this percentage is statistically significant with respect to  Fig. 3. The links between same postal codes in the real network are the green dashed vertical line, while the distribution of values in the random replicas are the blue bars. The evidence of a homophily pattern is statistically significant, with the p-value associated to the z-score being 7.1 10 -14 . Also in this case we add a Gaussian fit to show that the assumption of normality, needed to extract a p-value from the z-score, is verified. As much as in the previous case, we now want to investigate the role of postal code in keeping the network connected. If we now keep all the links between affiliated with the same postal code and remove the ones between mafia members with different postal codes, we obtain a network divided into 118 components. The right panel of Fig. 3 shows the distribution of the number of connected components obtained in 10,000 numerical experiments where 336 links are removed randomly. Again we find a p-value of the associated z-score of 3.0 10 -14 , thus indicating that links between mafia affiliates with different postal codes are crucial in ensuring the network connectivity.
The results of Fig. 3 show that we have homophily with respect to postal codes and, at the same time, links between different mafia affiliates with different postal code help in keeping the network connected. This result replicates the one relative to the syndicate affiliation and it is somehow expected. In fact, we know from Ref. [10] that mafia syndicates are characterized by a strong territoriality, to the extent that mafia syndicates are characterized by specific postal codes. The above results therefore indicate that, from the point of view of network connectivity, one can consider postal codes as a proxy of syndicate affiliation.

Targeted attacks
Let us now consider the network G cc synd . The above results show that we have homophily with respect to mafia syndicates and, at the same time, links between different mafia syndicates help in keeping the network connected. We want to further argue on this point by investigating the role of syndicate affiliation with respect to the resilience properties of such collaboration network. The resilience of a network has been thoroughly investigated in the literature by considering both random and targeted attacks. In both scenarios, in order to dismantle a network one should know the percentage of nodes p c that must be removed.
In the case of random attacks it has been shown [19] that p c is the percolation threshold after which no giant component is observed in the network, where k and k 2 are the first and second moment of the degree distribution. This result is general and applicable to all randomly connected networks, regardless of the specific form of the connectivity distribution (and provided that loops may be neglected). Its derivation stems from the Molloy and Reed criterion [33], according to which a giant component exists when K 2 / K = 2, and from the fact that when removing nodes from a network the original degree distribution P(k) changes to where p is the fraction of removed nodes. Indeed, after applying the Molloy and Reed condition to P(k) one obtain p c as formulated in Eq. (1). But how is percolation connected to network dismantling? Suppose we have a graph G with a nodes set V and edge set E. We make all the nodes independently accessible with probability p and inaccessible with probability (1p). When a node becomes inaccessible (i.e. it is removed from the network) the possibility that any two nodes of the network are connected degrades. Thus, there is a percolation threshold above which the network gets fragmented, i.e. it is impossible to find a path that connects all possible pairs of nodes. This corresponds to having the largest connected component of the network broken down in small parts [1]. Thus, the main idea behind Eq. (1) is that p c represents a critical value below which any node i which is connected to a node j in the largest connected component is also connected to at least one other node. When the fraction of removed nodes is When applying Eq. (1) to the network G cc synd we get p c = 0.91. However, Eq. (1) applies to random networks, and we don't expect our network of criminal collaborations to be random. Indeed, when we apply the percolation approach described in Ref. [19] to the original network G cc synd we get p (real) can be seen as an undirect, quantitative measure of the non-randomness of our network. The value of p real c implies that a strategy based on random attacks to the network is highly uneffective, given that the network breakdown would be achieved after removing about 50% of the nodes. Therefore one must device a different strategy.
The fact that specific networks, due to their topological properties, are quite resistent to random attacks, was already highlighted in Ref. [17], where new strategies, based on the centrality properties of the network nodes, were proposed. Inspired by such results, in the following we consider a strategy based on targeted attacks obtained by removing the most central nodes. In our first attempt we remove nodes according to their degree, from the largest to the smallest. In the left panel of Fig. 4 we show the size of the largest (LC q ) and second largest (SLC q ) connected component as a function of the percentage q of removed nodes. One can see that the difference between LC q and SLC q gets smaller at a certain critical value of q. Specifically, when q increases we observe the emergence of a number of smaller connected components that increase their size at the expense of the first component, making the network fragmented. However, as soon as q exceeds a certain critical value we observe an increase in the number and a decrease in the size of these smaller components. The percolation threshold p c can be estimated as: i.e. by selecting the value of q that maximizes the size of the second largest component. With a dismantling strategy based on degree we get p (deg) c = 0.19. In the right panel of Fig. 4 we show the same information in the case when nodes are removed according to their betweenness. In this case the percolation threshold is p (betw) c = 0.09, which is much smaller than p Inspired by the results of Sect. 3.2, another possibility is to remove nodes starting from those that have the largest number of connections with other mafia affiliates belonging to different mafia syndicates. Thus, here we are not considering the mere degree, i.e. the connection with other mafia affiliates, but whether the accomplices of each member are affiliated to different mafia syndicates. With this strategy, we get p (syndicate) c = 0.12 (Fig. 5). When comparing this strategy with those based on degree and betweenness, one can notice that p (syndicate) c seems slightly smaller than p (deg) c and close to p (betw) c . In Fig. 6 we compare the strategy based on degree with the one based on syndicates after the removal of a fraction p of nodes using the dismantling order based on degree and syndicates, respectively. One can see that the largest connected components has lost 21 and 30 nodes, respectively, thus indicating a superior dismantling strenght for the latter strategy, with the same number of nodes removed.
It is worth noticing that the ordering in terms of degree and the ordering in terms of connection with other mafia affiliates are not the same. In Fig. 7 we show a scatter-plot where for each network node we show the ordering in terms of degree (vertical axis) versus the ordering in terms of connection with other mafia affiliates (horizontal axis). We also added (red line) a linear fit y = ax + b showing a slope a = 0.86. Indeed, differences between the two orderings are clearly visible, which explains why p (syndicate) c and p (deg) c are different from each other. However, points in the scatter-plot are generally aligned along the diagonal, which tells us that the connection with other mafia affiliates is a good proxy of the network degree, as also indicated by the fact that the slope a is smaller from unity.
To sum up, the best strategy we found for targeted attacks aiming at dismantling the network G cc synd of Cosa Nostra affiliates would be to remove node with the highest betweenness. However, this approach requires the exact knowledge of the whole network of criminal collaborations, which is not available to the law enforcement agencies from the beginning. In fact, in our case the network is the result of 15 years of investigative efforts. The same consideration applies to an attack strategy based on the knowledge of the degree. Our results provide a way to circumvent this problem, since we have found a comparably effective strategy that is based on mapping the collaborations of each member with other mafia syndicates, which is a local property of each node and does not require the knowledge of the complete network. This is information that law enforcement agen- cies can have much more easily as it only presupposes knowledge of the mafia substrate of a certain territory and not the precise details of the criminal activities of each mafia affiliate.
Starting from the G cc pc network we obtain results that are qualitatively similar to those illustrated above. Also in this case, this is an important result from an operative point of view, because it is relatively easy to retrieve the residence of mafia suspects and investigate their relationships with other individuals.

Discussion and conclusions
Although our results have been obtained with a very simple toy-model, their importance is twofold.
On one side, we find evidence that a winning strategy for dismantling the network of Cosa Nostra affiliates is to look for those individuals that have many connections with different mafia syndicates. This is the kind of information that law enforcement agencies can access with traditional investigation techniques. This result can already be used to implement real operational strategies. On a more technical ground, one might think of implementing a decision support tool for law enforcement agencies and judiciary that, on the basis of the available information, provides a set of possible strategies. The present work can be thus understood as a first step in this direction, whose maturity level can be assesssed as TRL-1 (Basic Principles Observed and Reported) [34] On more theoretical grounds, our results highlight the contraposition between homophily and heterogeneous syndicate affiliation. This somehow mimics the classical comparison between strong ties and weak ties [31]: links amongst mafia affiliates belonging to different syndicates are weak. However, they are crucial in maintaining the connectedness of the network. Notwithstanding their central role, weak ties, i.e. links between mafia affiliates belonging to different syndicates, bring in the network fragility [35,36]. The homophily properties we investigated above tell us that Mafia syndicates are strongly connected entities that can be regarded as sub-networks with their own specificities (that we did not investigate in this work). Fragility is hidden in the links that interconnect such subnetworks and here is the place where an optimal strategy from law enforcement agencies can dismantle the whole network. The issue of fragility addressed in this work opens up the way to new empirical investigations. In fact, one might investigate the issue of fragility in connection with Cosa Nostra repentants. A successful strategy operated by judiciary and law enforcement agencies, during the 80s and 90s of the past century, was based on the information that such repentants provided in exchange for penalty discounts. These people were clearly a source of fragility within the Cosa Nostra affiliates network. It would be interesting to investigate their role within the different syndicates, with the aim of understanding (i) whether they disrupted the network at the level of single syndicates or at the level of links amongst syndicates and (ii) in case of single syndicates disruption, whether and how this disruption propagated all over the network.
There are several limitations in the present study. The first one is that the networks of N synd = 256 nodes and L synd = 857 links or N pc = 159 nodes and L pc = 399 links are relatively small, thus covering only part of the more general network of convicted Cosa Nostra affiliates. This mainly depends on the fact that the anagraphical data on mafia members is not easily available. In our experience, this is highly confidential information that municipalities are quite reluctant to release, even for academic purposes. This is not a simple issue to overcome, as it involves the governmental policies on sharing data among its branches, which is out of scope of an academic publication. Another issue regards the way the original network of Cosa Nostra affiliates was built [10], i.e. by looking only at the criminal activity that were uncovered through investigation. We have an estimation that our network covers about 10% of the more general network of Cosa Nostra affiliates. However, we believe that this is not a true limitation, because the incomplete knowledge of the "true" Cosa Nostra network is exactly the motivation for using our approach rather than a strategy based on network metrics. There is another limitation in our data that we have not been able to encompass: Cosa Nostra individuals that have many relationships with different mafia Syndicates are at the moment selected from the bulk of Cosa Nostra affiliates that have been sentenced by the Palermo Judiciary. However, there might be individuals that are totally unknown to the Judiciary and still have important relationships with syndicates. In this respect it would be interesting to complement our analyses by using information available from the law enforcement agencies, e.g. police reports that not necessarily are then used in judiciary sentences.
As a future work, another interesting extension of our work regards the analysis of directed networks of Cosa Nostra affiliates. In this framework, directed links might represent hierarchical relationships within syndicates and/or hierarchical relationships amongst syndicates. That would call for the development of dismantling techniques for directed networks, which is an interesting research field per se. However, at the moment we do not have data either about the internal structure of syndicate and their relationships, and we leave this idea for a future work.
In summary, we have devised a percolation-based effective procedure for network dismantling. The protocol assumes that network nodes are removed starting from those who have the highest number of links with different syndicates. We show that our procedure has effectiveness comparable to an attack based on the network centrality of nodes. The advantage of our protocol is that we use information easily available to law enforcement agencies, as it only presupposes knowledge of the mafia substrate of a certain territory and not the precise details of the criminal activities of each mafia affiliate.