Routerlevel community structure of the Internet Autonomous Systems
 Mariano G Beiró^{1, 2}Email authorView ORCID ID profile,
 Sebastián P Grynberg^{2} and
 J Ignacio AlvarezHamelin^{2, 3, 4}
Received: 4 June 2015
Accepted: 17 July 2015
Published: 15 August 2015
Abstract
The Internet is composed of routing devices connected between them and organized into independent administrative entities: the Autonomous Systems. The existence of different types of Autonomous Systems (like large connectivity providers, Internet Service Providers or universities) together with geographical and economical constraints, turns the Internet into a complex modular and hierarchical network. This organization is reflected in many properties of the Internet topology, like its high degree of clustering and its robustness.
In this work we study the modular structure of the Internet routerlevel graph in order to assess to what extent the Autonomous Systems satisfy some of the known notions of community structure. We observe that most of the classical community detection methods fail to detect the Autonomous Systems as communities, mainly because the modular structure of the Internet (as that of many complex networks) is much richer than what can be captured by optimizing a global functional: Autonomous Systems have largely variable sizes, structures and functions. Classical methods are severely affected by resolution limits and by the heterogeneity of the communities; even when using multiresolution methods, there is no single resolution at which most of the communities can be captured.
However, we show that multiresolution methods do find the community structure of the Autonomous Systems, but each of them has to be observed at the correct resolution level. Then we develop a lowcomplexity multiresolution modularity optimization algorithm that finds communities at different resolution levels in a continuous scale, in one single run. Using this method, we show that with a scarce knowledge of the node affiliations, multiresolution methods can be adjusted to retrieve the Autonomous Systems, significantly improving the results of classical singleresolution methods. Finally, in the light of our results, we discuss recent work concerning the use of a priori information to find community structure in complex networks.
Keywords
1 Introduction
The Internet is a complex network composed of routing devices (routers) which are organized into administrative entities, the Autonomous Systems (ASes). Each AS has its own routing policies and design criteria, which are based on technological, geographical and economical constraints and aim at maximizing the performance in terms of bandwidth, delay and resilience. ASes are nowadays identified with large carriers, ISPs (Internet Service Providers), IXPs (Internet Exchange Points), CDNs (Content Delivery Networks) and universities, among others, and they are interconnected as the result of commercial agreements between them, either as peers or in providercustomer relationships.
ASes can also be classified into core ASes (once called Tier1’s), and peripheral ASes. The core ASes are typically the large carriers and some CDNs and are densely connected between them. Instead, most ISPs lay in the periphery and are connected to one or more core ASes. This coreperiphery division provides the Internet with an interesting hierarchy. In fact, many largescale properties as the resilience, scalability and smallworldness of the Internet arise largely as a consequence of its hierarchical structure and selforganization [1, 2]. Understanding them might help developing better models of the Internet and improving its performance, by optimizing routing or reducing congestion, for example.
In this work we approach the modular structure of the Internet at the router level by trying to identify the ASes as communities of nodes. We explore different notions of community structure, and we shall see that: (a) The goodness of the communities is in some way related to the coreperiphery hierarchy of the ASes. (b) Though most of the ASes satisfy some notion of community, their largely variable sizes represent a hard problem for many classical community detection algorithms. (c) Using multiresolution methods instead, the ASes can be retrieved if each of them is observed at the right resolution level. Finally, we shall propose a method for identifying ASes using samples of nodes.
Our work falls into the study of Internet topology. Some of the main results on this area can be found in [3]. In particular, the study of community structure of the Internet at the Autonomous Systems level has been previously approached in [4–7] and, to the best of our knowledge, the community structure at the router level has only been addressed in [8] in the context of traffic queuing analysis. For the community detection problem in general, we address the reader to the survey by Fortunato [9].
The paper is organized as follows: In Section 2 we describe the Internet exploration that we used. In Section 3 we apply several community detection algorithms to it, matching the obtained communities to the ASes; we also introduce a hierarchical partitioning algorithm based on multiresolution modularity maximization and we determine which is the best resolution for each AS; we also evaluate other multiresolution algorithms. In Section 4 we evaluate the results from the Internet topology perspective, explaining their variability: why some ASes satisfy our notion of community while others do not. We also discuss the methodological problems of community detection in heterogeneous networks and some possible solutions. Finally, in Section 5 we extract the main results of this work.
2 Dataset
The dataset used throughout this paper was provided by CAIDA [10]. The CAIDA association performs daily explorations of the Internet by sending periodic IP packets (probes) towards random destinations from a set of sources called monitors (which nowadays count around 100). These probes are sent with a traceroute tool which exploits the ICMP functionality: by means of the ICMP protocol, the intermediate routers in a datagram transmission can provide the source with some information on the traversed path. Using this information, the CAIDA infrastructure builds a map of the Internet at the router level. Several problems must be resolved through this process, such as IP aliasing (the association of several IP addresses with one same router) [11] and the existence of anonymous routers (routers which do not answer probes). Traceroutebased sampling is not a perfect sampling and has several biases [12]: some links are more frequently visited than others, and some links are probably not visited at all. This, together with the fact that the Internet is a fast evolving network, turns the map into a partial, approximate view of the Internet.
However, routerlevel maps of the Internet have several applications, like improving routing algorithms by taking into account the network topology, understanding information propagation, and studying the robustness and scalefree properties of networks [1, 13]. They can also be used for constructing maps of the Internet at the Autonomous Systems level [14, 15].
2.1 Graph preprocessing
The graph contains some lowdegree structures which make it difficult to analyse its community structure without some preprocessing. Regarding the AS affiliation, for example, the dataset contains 21,108 different ASes, of which 8,023 have less than 5 nodes, and 865 ASes contain 1 single node of degree 1. We detected the presence of many tendrils (sequences of degree2 nodes finishing with a degree1 node) involving 1,393,846 nodes. These nodes do not take part in any triangle or clustered structure of the network, and this is probably due to a limitation on the exploration. However, they can be considered to belong to the community of the first node with higher degree they are attached to, so we removed these nodes by computing the 2core of the graph [19].
Correlation between having degree 2 and having AS affiliation
degree 2  AS  

True  False  
True  0.12  0.16 
False  0.70  0.02 
3 Analysis methods and results

Infomap, by Rosvall and Bergstrom, based in the description length [21].

LPM, the Label Propagation Method by Raghavan et al. [22], which performs a diffusion process on the graph.

Deltacom, an efficient greedy algorithm for modularity optimization introduced here.

Louvain, a fast modularity optimization algorithm [23].

CommUGP, a local community detection method [24].
In the following subsection we shall introduce Deltacom, an algorithm for hierarchical partitioning based on modularity maximization. Later on, we shall evaluate the results using a similarity metric.
3.1 A hierarchical partitioning method based on multiresolution modularity
Deltacom is based on the optimization of modularity. We recall that modularity was introduced by Newman in [25, 26] and, since then, several methods for its maximization have been proposed, like the one by Guimerà and Amaral based on simulated annealing [27], the extremal optimization method by Duch and Arenas [28], the fast greedy algorithm by Blondel et al. [23], or the multilevel algorithm by Noack and Rotta [29].
The following two results about modularity are fundamental to our present work: in [30] Fortunato and Barthélemy showed that modularity has a resolution limit, i.e., its maximization tends to put small communities together when they are connected among them; in [31] Reichardt and Bornholdt observed that modularity can be understood as the Hamiltonian of a ferromagnetic Potts model. Thus, maximizing the modularity implies finding the groundstate of this model, and the authors developed a simulated annealing based procedure for doing it.
Deltacom considers the optimization of the modularity Q as a particular case of the optimization of a more general functional \(Q_{t}\), with a resolution parameter t, in which modularity corresponds to a normalized resolution, i.e., \(t=1\). The source of our idea can be traced back to the γ resolution parameter in [31], and has also been followed by Lancichinetti and Fortunato in [32]. In [31], the maximization is based on simulated annealing, it must run at one single resolution each time, and its computational complexity is very high. The advantage of our method is that the resolution evolves dynamically, so that all the partitions at different resolutions can be produced during one single run, and with a low computational complexity.
Deltacom is based on an agglomerative greedy algorithm; each step of the agglomerative process is a local maximum for \(Q_{t}\), i.e., the generalized modularity, at some particular resolution t. In other words, for each tvalue a community partition is found, which is locally toptimal in some sense. As these communities are joined, the resolution shrinks and t gets smaller. For \(t=1\), the structure is a local maximum for the classical modularity. In other words, we observe the graph as by using a magnifying glass, obtaining partitions at every resolution level. The result is a hierarchical structure in which partitions at higher resolution levels are always refinements (in a mathematical sense) of those at lower ones.
Here we present a brief description of how the algorithm works. Further mathematical details on the properties of \(Q_{t}\) and of our maximization algorithm can be found in [33].
3.1.1 Newman’s modularity
3.1.2 Introducing a resolution parameter
It is clear that a positive \(\Delta Q_{t}\) value for a particular t also implies a positive (and even higher) \(\Delta Q_{t'}\) for any resolution \(t'< t\). In other words, any agglomerative process which monotonically increases \(\Delta Q_{t}\) also serves as a process monotonically increasing \(\Delta Q_{t'}\) for any \(t'< t\).
It is also immediate that a very large t value discourages any join, because \(\Delta Q_{t}\) is negative for every pair of communities. It can also be shown that for t larger enough the global maximum for \(Q_{t}\) would have each node isolated in its own community.
When a local maximum is found for some t (i.e., no pair of communities can be joined without decreasing \(\Delta Q_{t}\)), we decrease the resolution to \(t'\) such that \(\Delta Q_{t}'\) becomes positive for some pair of communities, following the previous formula. The agglomerative process continues until obtaining as many communities as connected components of the graph, and the obtained result is a set of locally optimal partitions \({\mathscr{C}}_{t}\) for every resolution t, and such that the finer partitions are refinements of the coarsergrained ones. Even more, these partitions are what we call weakly optimal partitions, in the sense that not only the join of any pair of communities C, \(C'\) would decrease \(Q_{t}\), but also any coarser partition (i.e. obtained by joining its communities in any way) would also decrease it.
The code for the Deltacom algorithm is freely available at SourceForge [34]; given a graph as a list of edges, it produces a set of community partitions for every resolution value.
While \(\mathcal{O}(nm)\) is a theoretical upper bound for a general graph, the practical running time for sparse graphs can be greatly reduced with some considerations: step (c) gives all the information for the next step (a), because the community pair which maximized \(t'\) in line 1.11 is the same that will have \(\Delta Q_{t}=0\) in the next iteration; also, by keeping a search structure like an ordered tree with all the pair of connected communities \(C,C'\) ordered by \(\frac{e(C,C')\cdot2m}{k_{C}\cdot k_{C'}}\) decreasingly, there is no need to traverse all the lists in each iteration, but we must just update the modified values into this tree and choose the community pair at its head as the one maximizing \(t'\). For sparse graphs, this process has an upper complexity bound of \(\mathcal{O}(m\cdot \log(m)\cdot s_{C}^{\max})\), where \(s_{C}^{\max}\) is the maximum number of neighbour communities that a community may have at any time of the process.
3.2 Finding ASes through similarity maximization
We can clearly see that the five methods fail to reproduce the ASes structure. In the best performing one, Infomap, only 50% of the large ASes reach a recall of 0.4, which is however small. A premature conclusion might state that the Internet graph does not have communities at the router level, or at least that they are not related with the Autonomous Systems. But we will show that this is not the case, and that the failure of the methods is in part due to the largely variable sizes, internal structures and functions of the Internet ASes. However, the structure of the Internet graph at the router level can be better captured by using multiresolution methods.
In the case of Deltacom, \(\Theta=[t_{\min}, t_{\max}]\) and t plays the role of θ. In this way, the Jaccard similarity allows us to capture the moment in which each AS is formed. With this adjustments, the results clearly improve. The recall score \(R_{2}(AS)\) (Eq. 8) over all the ASes is 0.87 (and 0.67 over the ASes of at least 100 nodes). The new cumulative distribution is presented as the black curve in Figure 2, and we shall call it Deltacom Multiresolution, as we are exploring all the communities at all possible resolutions.

The Infomap algorithm has a multiresolution variant [36], which generates a community tree with several levels \(k\in\{1,\ldots,k_{\max}\}\). We matched each of the ASes into this structure in the same way as we did with Multiresolution Deltacom: i.e., we find the best level for each AS, and then we compute the recall score (now \(\Theta=\{1,\ldots,k_{\max}\}\) and \(\theta=k\)). Here we found a clear improvement too, which is shown in the orange curve of Figure 2, pointing out that it is important to find the correct resolution for each AS.

From the general schema described by Reichardt and Bornholdt in [31] several multiresolution approaches have been proposed. We used the Constant Potts Model by Traag et al. [37], which removes the null model in the Hamiltonian to cancel global dependences. For this algorithm we tested different values of \(\gamma\in[10^{8}, 1]\). We observed that this interval covers all the interesting situations, as for \(\gamma=10^{8}\) we obtain a partition with 10 communities, and for \(\gamma=1\) all the nodes are in different communities. We sampled the interval by dividing it into 4, 8, 16 and 32 pieces, equidistant in the logarithmic scale. When dividing it into \(2^{k}\) pieces we obtain \(2^{k} + 1\) values of γ which we use as the parameter space \(\Theta=\{\gamma_{0}, \gamma _{1}, \ldots, \gamma_{2^{k}}\}\) in order to match each AS against its most similar community (according to the recall score \(R_{2}(AS)\)) among the \(2^{k} + 1\) different partitions. We chose to stop the exploration at \(k=5\) (33 values of γ) because the observed convergence suggested that refining the parameter space would not produce much better results. The curve for \(k=5\) representing CPM is shown in purple in Figure 2. Finally, the central picture in Figure 3 shows the best γ value (from among the 33 considered values) for each AS, as a function of its size. Compared to Deltacom, in CPM the correlation between resolution and size is not so strong.
From this analysis we conclude that there is no single resolution at which we can find most of the ASes. This had already been observed in [32] for modularity, and is partly due to the fact that modularity is affected by resolution limit issues [38]. Similar theoretical analysis have been provided for Infomap’s map equation [39] and for some methods based on Hamiltonians of Potts models [40]. Despite these difficulties, here we show that the ASes do exist as communities at different levels of the hierarchical structure.
3.3 Detecting ASes from samples
The previous results show us that different ASes have different resolutions, and that we cannot expect to get all the ASes with classical community detection methods or at some particular resolution value. However, our results with the multiresolution methods are ideal, in the sense that we used the information of the AS structure in order to know at which resolution to stop for each AS. Now we wonder whether we might retrieve the ASes using a minimal amount of information: their size, and a small sample of their nodes. We observed that there is a strong dependence between the AS size and the AS resolution at which the AS is best identified by our algorithm, so that we shall use the linear regression in Figure 3 to approximate the best resolution for each AS, t̃. If we look for the most similar community to each AS restricted to the partition produced by Deltacom at that approximate resolution, \({\mathscr{C}}_{\tilde{t}}\), then we obtain the brown curve in dashed lines in Figure 2.
The closeness between the brown and light blue curves proves that we do not need to know the whole AS in order to identify it as a community at a certain step of the modularity optimization process: 15% of the nodes is enough in order to get as good a result as if we knew all the AS nodes. However, the distance between the brown and black curve tells us that the linear regression between size and resolution is just a rough approximation. We also point out that we tested sample sizes between 5% and 25% and the performance was similar (average recalls between 0.57 and 0.59), mainly because ideally the probability of misclassifying n random nodes of an AS decreases exponentially with n, so that even for a small sample of nodes the majority of them should be correctly classified if the AS exists as a community at that resolution (e.g., with a recall of at least 0.5). In conclusion, the samplebased method seems to be very successful, and is only limited by the error of the linear regression. We can thus identify many of the ASes just knowing their size and a small sample of their nodes. This method can also be extended to other networks, even if the relation between resolution and size is not known exactly. In that case, one should match the most similar community at each resolution, and then choose the resolution for which the most similar community has the closest size to the AS size, for example.
4 Discussion
Average recall scores of the methods
Method  avg R ( AS )  

ASes ≥ 100 nodes  All ASes  
LPM  0.05  0.18 
Deltacom (t = 1)  0.08  0.00 
Louvain  0.13  0.01 
CommUGP  0.16  0.37 
Infomap  0.41  0.35 
Deltacom Multiresolution + Regression line + 5% sample  0.57  0.75 
Deltacom Multiresolution + Regression line  0.59  0.77 
Infomap Multiresolution (Best Resolution)  0.61  0.49 
Deltacom Multiresolution (Best Resolution)  0.67  0.87 
CPM (Best Resolution from 33 values of γ)  0.70  0.83 
In the central pictures of Figure 5 we plot the number of internal edges as a function of node degree for some ASes which were not welldetected. The diagonal dashed line corresponds to \(k_{\mathrm{in}}/k\); the nodes under this line have more external connections than internal ones. The colour indicates if the node was classified inside the community which retrieved the best Jaccard similarity (black) or not (red). We observe that all the ASes in Figure 5 have high degree nodes with more external that internal connections; these nodes are not recognized as part of their ASes and thus induce many of the shortdegree nodes into error. Into these group of ASes we found carrier networks and IXCs (e.g., AT&T and Sprint), IXPs (e.g., the London Internet Exchange) and some CDNs (e.g., Akamai and Amazon).
These types of ASes lay in the backbone of the Internet and they are densely connected among them, so that their nodes may not respect the classical notions of community. It is possible that we should think in an overlapping community structure for the core of the Internet, following the ideas in [42] regarding the coreperiphery structure of networks and its relation with community structure. In this network, highly overlapping communities might be related with the presence of IXPs or nodes which do not satisfy the strong community definition, for example.
It is interesting to contrast the general results against some classical notions of community. Looking at the central pictures of Figures 5, 6 and 7 again, we can see that all of them have nodes under the dashed diagonal line which represents the limit of the notion of strong community by Radicchi et al. [43], i.e., having more internal connections than external ones. This notion is quite strict indeed, and neither the real ASes nor the communities found by modularity maximization follow it. In [44], Hu et al. introduced a relaxed version of this notion, in which each node has more internal connections than towards each of the other communities separately; we found that 4% of the nodes (spread through 70% of the ASes) do not follow this definition, either.
The limitations of our algorithm for correctly recovering some of the ASes might also be related to the detectability threshold of community detection [45–47] and to the community detection paradox [48]: even when there are more average internal connections than external ones, algorithms might not succeed to retrieve a community if the difference between these quantities is not above a certain threshold. This limitation not only affects modularitybased methods but is quite general.
Hric et al. [35] have recently used the Jaccard similarity for identifying communities with groundtruth groups as we did. They also observed that community detection algorithms perform very poorly according to this measure and they proposed some conjectures, as the need to change the notion of community or to introduce some nontopological information. In [49], for example, Darst et al. analyse how an a priori knowledge of the number of communities improves the performance by helping to set the resolution level. But, in many complex networks like the Internet there might not be one single resolution at which all the communities can be discovered. Here we have seen that knowing the community sizes and some of their nodes, multiresolution algorithms might increase their performance significantly.
The use of some additional information for the identification of communities may be a tradeoff solution considering the complexity and limitations of the community detection problem. Our last experiment, in which we use a sample of nodes for matching the community in the hierarchical structure, is related to the seed set expansion problem (i.e., extracting a community by expanding a local seed set of nodes), which has been recently explored in [50–52].
Thus, our work may shed some light into the problem of improving the performance of community detection algorithms: in the Internet routerlevel graph, we have shown that the coexistence of communities with multiple resolutions is the main difficulty for the detection algorithms. So, when proposing new conceptions of community structure, we should take into account that very heterogeneous communities must coexist. Regarding the detection of groundtruth communities in real networks, the following question remains open: Can the groundtruth communities be retrieved without additional information, e.g., as significant communities at the right resolution level?
5 Conclusions
We analysed the internal community structure of the Internet Autonomous Systems by using Deltacom, a multiresolution algorithm. Deltacom performs hierarchical partitioning through modularity optimization at different resolution levels in one single run, and with a low computational cost. The algorithm was made available to the scientific community as an opensource software through SourceForge.
We applied Deltacom to the routerlevel graph of the Internet obtained from CAIDA, and we observed that most of the ASes which are not in the Internet backbone can be identified as communities if the correct resolution level is used in the community discovery process. However, we showed that several optimization algorithms fail to detect these Autonomous Systems due to their largely variable sizes and structures. With our results, we show that the use of proper additional information can significantly improve the performance of community identification.
Many of the Autonomous Systems in the Internet backbone (and usually identified by a very high AS rank) could not be identified as communities, and we observed that this was due to the presence of high degree nodes with more external connections than internal ones, violating one of the main notions of community structure. Instead, stub ASes and ISP’s have highdegree nodes which act as hubs and provide internal connectivity, while the external connectivity is usually provided by lowerdegree nodes. These ASes were well detected as communities.
By correlating the resolution of each AS with its size and using a small fraction of the AS affiliation data, we showed that most of the ASes which do have a community structure can be retrieved. In this way we provide a method for identifying Autonomous Systems with scarce information: a short sample of their nodes and their size.
Finally, we discussed the limitations of the community detection methods for detecting groundtruth communities. These difficulties had been previously observed for many large networks and we also observed it here for the Internet Autonomous Systems. Based on our results, we consider that the use of some local information might arise as a tradeoff solution for identifying communities in these cases.
We think that these results on the Internet structure at the router level can also be used for improving Internet topology models and the estimation of properties as node centrality or clustering. The work shall also lead to new discussions regarding the hierarchical structure of the Internet and the detection of significant communities in large heterogeneous networks.
Declarations
Acknowledgements
The authors thank Jorge R. Busch for his discussions on the Deltacom algorithm, and the anonymous reviewers who contributed to the paper with their suggestions. MGB acknowledges support from the ‘Lagrange Project’ of the ISI Foundation funded by the Fondazione CRT and from the ‘S3 Project’ funded by the Compagnia di San Paolo. This work was also funded by an UBACyT 20122015 grant (20020110200181) and a PICTBicentenario 201001108. MGB acknowledges a CONICET fellowship.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Caldarelli G, Vespignani A (2007) Large scale structure and dynamics of complex networks: from information technology to finance and natural science. World Scientific, River Edge View ArticleGoogle Scholar
 Ravasz E, Barabási AL (2003) Hierarchical organization in complex networks. Phys Rev E 67(2):026112 View ArticleGoogle Scholar
 PastorSatorras R, Vespignani A (2007) Evolution and structure of the Internet: a statistical physics approach. Cambridge University Press, Cambridge Google Scholar
 Gregori E, Lenzini L, Orsini C (2013) kDense communities in the Internet ASlevel topology graph. Comput Netw 57(1):213227 View ArticleGoogle Scholar
 Gregori E, Lenzini L, Orsini C (2011) kClique communities in the Internet ASlevel topology graph. In: 31st international conference on distributed computing systems workshops (ICDCSW 2011), pp 134139 View ArticleGoogle Scholar
 Ge X, Wang H (2012) Community overlays upon realworld complex networks. Eur Phys J B 85(1):110 View ArticleGoogle Scholar
 Rossi RA, Fahmy S, Talukder N (2013) A multilevel approach for evaluating Internet topology generators. In: IFIP networking conference, pp 19 Google Scholar
 Hirayama T, Arakawa S, Arai K, Murata M (2012) Modularity structure and traffic dynamics of ISP routerlevel topologies. In: The 2012 international symposium on nonlinear theory and its applications (NOLTA 2012), Palma, Mallorca, pp 856859 Google Scholar
 Fortunato S (2010) Community detection in graphs. Phys Rep 486(35):75174 MathSciNetView ArticleGoogle Scholar
 CAIDA: the Cooperative Association for Internet Data Analysis. http://www.caida.org/
 Keys K, Hyun Y, Luckie M, Claffy K (2013) Internetscale IPv4 alias resolution with MIDAR. IEEE/ACM Trans Netw 21(2):383399 View ArticleGoogle Scholar
 Dall’Asta L, AlvarezHamelin JI, Barrat A, Vázquez A, Vespignani A (2006) Exploring networks with traceroutelike probes: theory and simulations. Theor Comput Sci 355(1):624 MATHView ArticleGoogle Scholar
 PastorSatorras R, Vázquez A, Vespignani A (2004) Topology, hierarchy, and correlations in Internet graphs. In: Complex networks. Lecture notes in physics, vol 650. Springer, Berlin, pp 425440 View ArticleGoogle Scholar
 Chang H, Jamin S, Willinger W (2001) Inferring ASlevel Internet topology from routerlevel path traces. In: Proceedings of SPIE ITCom Google Scholar
 Tangmunarunkit H, Govindan R, Shenker S, Estrin D (2001) The impact of routing policy on Internet paths. In: Proceedings of the 20th IEEE international conference on computer communications (INFOCOM), pp 736742 Google Scholar
 The CAIDA UCSD IPv4 routed /24 topology dataset  201110. http://www.caida.org/data/active/ipv4_routed_24_topology_dataset.xml
 Huffaker B, Dhamdhere A, Fomenkov M, Claffy K (2010) Toward topology dualism: improving the accuracy of AS annotations for routers. In: 9th passive and active measurement conference (PAM 2010), Zurich, Switzerland Google Scholar
 The CAIDA macroscopic Internet Topology Data Kit (ITDK)  201110. http://www.caida.org/data/internettopologydatakit/release201110.xml
 Batagelj V, Zaveršnik M (2011) Fast algorithms for determining (generalized) core groups in social networks. Adv Data Anal Classif 5(2):129145. doi:10.1007/s116340100079y MATHMathSciNetView ArticleGoogle Scholar
 Donnet B, Luckie M, Mérindol P, Pansiot JJ (2012) Revealing MPLS tunnels obscured from traceroute. Comput Commun Rev 42(2):8793 View ArticleGoogle Scholar
 Rosvall M, Bergstrom CT (2007) An informationtheoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci USA 104(18):73277331 View ArticleGoogle Scholar
 Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in largescale networks. Phys Rev E 76(3):036106 View ArticleGoogle Scholar
 Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008 View ArticleGoogle Scholar
 Beiró MG, Busch JR, Grynberg SP, AlvarezHamelin JI (2013) Obtaining communities with a fitness growth process. Physica A 392(9):22782293 View ArticleGoogle Scholar
 Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113 View ArticleGoogle Scholar
 Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111 View ArticleGoogle Scholar
 Guimerà R, Amaral LAN (2005) Cartography of complex networks: modules and universal roles. J Stat Mech Theory Exp 2:02001 View ArticleGoogle Scholar
 Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104 View ArticleGoogle Scholar
 Noack A, Rotta R (2009) Multilevel algorithms for modularity clustering. In: Proceedings of the 8th international symposium on experimental algorithms (SEA 2009). Springer, Berlin, pp 257268 Google Scholar
 Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci USA 104(1):3641 View ArticleGoogle Scholar
 Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Phys Rev E 74(1):016110 MathSciNetView ArticleGoogle Scholar
 Lancichinetti A, Fortunato S (2011) Limits of modularity maximization in community detection. Phys Rev E 84(6):066122. http://pre.aps.org/abstract/PRE/v84/i6/e066122 View ArticleGoogle Scholar
 Busch JR, Beiró MG, AlvarezHamelin JI (2010) On weakly optimal partitions in modular networks. arXiv:1008.3443
 DeltaCom (2015) http://sourceforge.net/projects/deltacom/
 Hric D, Darst RK, Fortunato S (2014) Community detection in networks: structural communities versus ground truth. Phys Rev E 90:062805 View ArticleGoogle Scholar
 Rosvall M, Bergstrom CT (2011) Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE 6(4):e18209 View ArticleGoogle Scholar
 Traag VA, Van Dooren P, Nesterov Y (2011) Narrow scope for resolutionlimitfree community detection. Phys Rev E 84(1):016114 View ArticleGoogle Scholar
 Kumpula JM, Saramaki J, Kaski K, Kertész J (2007) Limited resolution in complex network community detection with Potts model approach. Eur Phys J B 56(1):4145 View ArticleGoogle Scholar
 Kawamoto T, Rosvall M (2015) Estimating the resolution limit of the map equation in community detection. Phys Rev E 91(1):012809 View ArticleGoogle Scholar
 Xiang J, Hu K (2012) Limitation of multiresolution methods in community detection. Physica A 391(20):49955003 View ArticleGoogle Scholar
 Luckie M, Huffaker B, Dhamdhere A, Giotsas V, Claffy K (2013) AS relationships, customer cones, and validation. In: Proceedings of the 2013 Internet measurement conference (IMC’13). ACM, New York, pp 243256 View ArticleGoogle Scholar
 Yang J, Leskovec J (2014) Overlapping communities explain coreperiphery organization of networks. Proc IEEE 102(12):18921902 View ArticleGoogle Scholar
 Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Natl Acad Sci USA 101(9):2658 View ArticleGoogle Scholar
 Hu Y, Chen H, Zhang P, Li M, Di Z, Fan Y (2008) Comparative definition of community and corresponding identifying algorithm. Phys Rev E 78(2):026121 View ArticleGoogle Scholar
 Decelle A, Krzakala F, Moore C, Zdeborová L (2011) Inference and phase transitions in the detection of modules in sparse networks. Phys Rev Lett 107:065701 View ArticleGoogle Scholar
 Nadakuditi RR, Newman MEJ (2012) Graph spectra and the detectability of community structure in networks. Phys Rev Lett 108:188701 View ArticleGoogle Scholar
 Radicchi F (2013) Detectability of communities in heterogeneous networks. Phys Rev E 88:010801 View ArticleGoogle Scholar
 Radicchi F (2014) A paradox in community detection. Europhys Lett 106(3):38001 View ArticleGoogle Scholar
 Darst RK, Nussinov Z, Fortunato S (2014) Improving the performance of algorithms to find communities in networks. Phys Rev E 89:032809 View ArticleGoogle Scholar
 Whang JJ, Gleich DF, Dhillon IS (2013) Overlapping community detection using seed set expansion. In: Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, New York, pp 20992108 Google Scholar
 Kloumann IM, Kleinberg JM (2014) Community membership identification from small seed sets. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 13661375 Google Scholar
 Li Y, He K, Bindel D, Hopcroft JE (2015) Uncovering the small community structure in large networks: a local spectral approach. In: Proceedings of the 24th international conference on world wide web, pp 658668 Google Scholar