Link creation and information spreading over social and communication ties in an interest-based online social network
© Aiello et al.; licensee Springer. 2012
Received: 1 October 2012
Accepted: 14 November 2012
Published: 5 December 2012
Complex dynamics of social media emerge from the interaction between the patterns of social connectivity of users and the information exchanged along such social ties. Unveiling the underlying mechanisms that drive the evolution of online social systems requires a deep understanding of the interplay between these two aspects. Based on the case of the aNobii social network, an online service for book readers, we investigate the dynamics of link creation and the social influence phenomenon that may trigger information diffusion in the social graph. By confirming that social partner selection is strongly driven by structural, geographical, and topical proximity, we develop a machine-learning social link recommender for individual users trained on a set of features selected as best predictive out of several and we test it on the still widely unexplored domain of a network of interest. We also analyze the influence process from the two distinct perspectives of users and items. We show that link creation plays an immediate effect on the alignment of user profiles and that the established social ties are a good substrate for social influence. We quantitatively measure influence by tracking the patterns of diffusion of specific pieces of information and comparing them with appropriate null models. We discover an appreciable signal of social influence even though item consumption is a very slow process in this context. All the detected patterns of social attachment and influence are observed to be stronger when considering the social subgraph on which communication effectively occurs. Based on our study of the dynamics of the aNobii social network, we investigate the possibility to predict the evolution of such a complex social system.
Global dynamics of online social media emerge from the aggregation of the behavioral footprints generated by the activity of the users and their interactions. Such complex information ecosystems are characterized by two fundamental components, namely the creation of social connections between individuals and the information exchange between them. Mining the static and evolutionary patterns of such phenomena is the key to understand and predict micro and macroscopic dynamics of the whole system.
So far, many efforts have been focused on investigating the causes that determine the creation of social links and the process of information diffusion along these links. If, on the one hand, some results obtained by previous work are supported by well-known sociological theories, on the other hand many dynamics characterizing online social systems are not intuitive, difficult to model accurately, and still widely unknown.
Among others, the microscopic dimension of the process of link creation, and the influence phenomenon that triggers the diffusion of a piece of information or the spreading of a behavioral norm across the social network have still many unexplored sides. In the first case, even though many studies have addressed the problem of predicting the global evolution of social graphs, only few investigations have been performed from the individual perspective, namely trying to predict future social connections of a single social agent. Similarly, even if several models of information spreading on social networks have been proposed in the past, it is still not clear to what extent information can spread quickly and effectively in the network, and whether the factors that determine influence between peers are generalizable across different social systems.
We contribute to shed light on these questions through the analysis of aNobii, an online social network for book lovers. Unlike the mainstream, general-purpose social networks (e.g., Facebook, Twitter, Google+), aNobii is a network of interest, where social aggregation is determined by the topical interests of the readers. Moreover, the contact network can be retrieved via crawling without restrictions, thus allowing the analysis of all the nodes reachable via crawling. The specificity of the domain considered and the richness of the features publicly exposed by the users allow the exploration of the social dimensions from an unusual angle. Our analysis is driven by two main goals:
1. Designing an effective strategy for the recommendation of new links to single users and verifying its effectiveness in an interest-based network. As opposed to the task of link prediction, link recommendation is a widely unexplored task and it has been addressed only for general-purpose networks. We survey a large amount of structural and topical features and we determine the best ones for recommendation purposes. We verify that recommendation in the considered domain is a harder task if compared to general purpose networks and we provide insights on the origin of this difference.
2. Providing quantitative measure of influence in an interest-driven domain by investigating the perspectives of both users (pairs of individuals interacting and exerting influence on one another) and items (books spreading in the network by word-of-mouth process). In the analysis of book spreading, we provide a novel comparison of diffusion traces with null models, we detect a clear signal of influence in a domain in which the consumption of items is a slow process, and we highlight some factors that foster the adoption of books by individuals.
A number of results emerge from the present study, including:
• The analysis of static properties of the aNobii online social system, including geographical and topical bias in link connectivity;
• The discovery of a relation between an item popularity and its viral diffusion potential;
• The detection of the importance of communication patterns over mere social links in the process of social link creation and influence;
• The introduction of a metric of structural node similarity inspired by graph-centrality metrics.
Overall, we present here a comprehensive study of the structure and dynamics of a social system that can be a valuable reference in online social network analysis, as it proceeds all the way from the data collection to the study of the complex dynamics of the system.
In the following we give an overview of the related work in the field (Section 2), we introduce the details of the aNobii social network and we discuss its static and dynamic structural properties (Sections 3, 4). The dynamics of link connectivity and social influence are discussed in Section 5, and the link recommendation algorithm is presented in Section 6.
2 Related work
Several studies have described online social systems from the point of view of their static network properties  and in the dynamics of their overall evolution [2, 3]. Temporal fluctuations of network topological features such as diameter, clustering coefficient and mixing patterns  and dynamics of link creation in social networks  have been explored in depth through the analysis of large-scale real world datasets. Previous works on link characterization, focusing on the patterns that describe the creation of links and how social ties features evolve in time, reveal that link creation is driven by proximity, triangle closure, reciprocation and homophily [5–9].
Among the topics related to the analysis of multi-agent systems, in this paper we focus on three broad areas that have attracted a strong interest; namely, the study of the communication patterns between users, the influence phenomenon, and the link prediction problem.
2.1 Communication networks
Recently, findings from social network analysis have been corroborated and expanded by the study of communication networks - also denoted as activity networks  or interaction networks  - that often coexist with social networks. The comparison of the graph of user-to-user interactions with the social network reveals similar connectivity patterns driven by reciprocity and triangle closure .
Activity networks are more dynamic than social networks and reflect changing trends in user interaction and information flow. Communication graphs have shown to be strongly clustered and to change over many time scales, even if the structural features of the activity network remain stable over time . It has been observed that the average interaction level with neighbors in the social network is very low  and often decreasing with time ; in agreement with this, studies on the Facebook interaction graph  reveal that the social links that are effectively exploited for user-to-user communication are a minority. Moreover, recent studies on Twitter revealed that users can entertain no more than 100-200 stable relationships among all their social contacts . Such results confirm the intuition that online social ties are not always good proxies to extract information exchange patterns.
Although the importance of communication links has been assessed in the past, many social phenomena such as homophily and influence have been studied using the graph of conventional social ties as reference. The effectiveness of modeling and predicting some social phenomena using communication networks instead of social networks has not been explored thoroughly, and has not been considered in the case of networks of interest like aNobii. We compare social and interaction ties in the context of link creation and information diffusion, finding that the information they carry have a different potential in predicting the formation of new links or the diffusion of information. For the link recommendation task, we introduce a metric that combines the information from both social and interaction networks to enhance the prediction accuracy.
2.2 Influence and diffusion
The task of capturing the dynamics of information spreading and influence that occur in networked environments has received much attention recently. Diffusion models of word-of-mouth processes have been developed in the past to enhance viral marketing strategies ; more recently, due to the large diffusion of social media, detection of influence patterns and of influential individuals has become important to capture the interaction dynamics in social networks and in real-time information networks.
Analysis of information propagation in Flickr  showed that diffusion is limited to individuals who reside in the close neighborhood of the seed user and the spreading process is very slow. Analysis of message cascading on Twitter has been used to estimate the degree of influence of users ; the most influential among a pair of users is determined using the difference between some activity metric, like the number of followers or number of tweet replies. In partial disagreement with this study, it has been shown that the number of followers (or of social contacts in general) does not imply a high influence degree .
A crucial task in the analysis of influence patterns is to discern real influence from unobserved factors, like homophily or confounding variables, that can induce statistical correlation between the behaviors or the profiles of connected users even without one being influenced by the other. Shuffling or randomization tests on user features are commonly used to detect a signal of influence inside noisy patterns of correlation between pairs of users . Investigations on the interplay between homophily-driven creation of social connections and the influence that neighbors exert on each other’s behavior have been made by Crandall et al.  on the Wikipedia collaboration network. Bakshy et al.  have reported a large scale experiment performed on the Facebook social network by randomizing the exposition of users to the items published by their friends, in order to expose the role of the social links in the propagation of an information, and to show the existence of a genuine influence phenomenon between Facebook friends.
Instead of representing the influence as an infection phenomenon between connected individuals, Yang and Leskovec  recently proposed a linear influence model which is agnostic on the network structure and relies only on the time of the contagion. These observations imply the presence of a hidden contagion web that is different from the observed social network . Based on similar observations, other probabilistic models that represent influence effects between peers disregarding social links structure have been proposed .
To complement previous studies, we focus here on the influence phenomenon in a social context where item consumption (reading books) is a much slower process than in general-purpose online social and news media. We explore the influence process both from the point of view of users and items, exposing strong signals of influence at the moment of social link creation and the generation of information fluxes over the existing links. We characterize the spreading traces (i.e., graphs of item adopters expanding in time), compare then to null models and provide some insights into the still open question of whether the fraction or the number of influencing neighbors has a stronger impact on the diffusion probability.
2.3 Link prediction
Predicting the presence of a link between two entities in a network is one of the major challenges in the area of link mining . Such edge-related mining task is usually defined as link detection  when it aims to disclose the presence of unobserved or unknown links on a static network or as link prediction when it aims to foresee whether a connection will arise in the future between nodes that are unlinked at the current time. Link recommendation finally is a task whose goal is to provide to a target user a list of contacts that he will likely be keen to form a social link with .
Seminal work on link prediction was presented by Liben-Nowell and Kleinberg [29, 30]. They identify structural properties of the graph which can be used to build a ranking of the node pairs based on their structural similarity, which is in turn exploited to predict future interactions. Several slight variants of this approach have been adopted . Another early work by Popescul et al.  focused instead on link detection using a classifier trained on the feature vectors that describe the nodes of the graph.
Combining structural graph similarity measures and simple node-based features in a supervised learning approach to link prediction has been also tried in the past [9, 33], showing the improvement of the prediction performance compared to predictors based solely on topological features. Geographical proximity between nodes  and groups affiliation  have been effectively used as node-based feature as well. Recently, some tests have been done also on the predictive power of some network clustering algorithms in link prediction tasks .
The best-known topological measures of structural similarity between pairs of nodes are reviewed and refined by Zhou et al.  and Lü et al. . The authors compare several structural similarity metrics for link prediction in terms of accuracy and computational efficiency. Novel local proximity measures are also proposed and shown to be efficient and accurate in link detection. Efficiency of structural proximity metrics on graphs is addressed also by Song et al. .
Detection of links based only on the information extracted from folksonomies is performed by Schifanella et al. . Similarity measures explicitly designed for the folksonomic space are used to compute a lexical proximity between users. A similar context is considered by Leroy et al. , who leverage the group membership information from Flickr to build a probabilistic graph and detect the hidden social graph with a good accuracy.
The problem of detecting both unknown links and missing node attributes in a network is addressed by Bilgic et al. . They propose an iterative method that refines at each step the prediction of one of the two features considered leveraging the information gained on the other feature at the previous step.
The role of temporal aspects in prediction is explored by Tylenda et al. , who exploit the information of recent interaction between individuals to improve the prediction accuracy. Dunlavy et al.  use a matrix-tensor method to predict links that will be created in the future in networks with an underlying periodic structure.
Even if the majority of papers is focused on link prediction on simple graphs, a few techniques have been developed also for different kinds of networks. Work has been made in link detection on weighted networks [45–47], bipartite networks [44, 48, 49] and signed social graphs . Very recently, an approach that combines supervised learning and random walks has been shown to have a promising accuracy for both prediction and recommendation of new links .
Finally, some approaches based on probabilistic models such as relational Markov networks  and probabilistic relational models  deserve to be cited. These approaches have however not been proven to be scalable and they have not been extensively tested on real-world datasets.
Despite the large amount of work in the prediction area, few efforts have been devoted to the task of link recommendation, which is inherently different (and more relevant for real social media services) since it aims to the satisfaction of single users and not just to maximize the ability of predicting the global evolution of the social graph. Moreover, to the best of our knowledge, link recommendation has not been studied in networks of interest but only in general-purpose online social graphs like Facebook. In our recommendation method we collect all the most relevant state-of-the-art features used for link prediction, define an additional feature, and we rank them according to their effectiveness in the recommendation task.
3 aNobii dataset
We analyze a temporal dataset taken from http://aNobii.com, a website for book lovers. The main feature of aNobii is the personal digital library that every user can build by picking titles from a vast database of more than 30 millions publications along with their metadata (such as author, publication year, etc.). Every book in the library can be marked with a reading status (e.g., ‘finished reading’) and can be annotated with keywords (tags), a rating (from 1 to 5 stars) and a review. There is also a wishlist containing titles that users have planned to read. Users can enrich their profile with other personal information like their gender, age, marital status and a geo-location composed by a country and, optionally, a town. Country is specified in 97% and city in roughly 40% of the profiles.
Channels of social interaction form another crucial component of aNobii. The social network is composed by two different kinds of mutually-exclusive ties, namely the friendship and the neighborhood relations. Even if it is up to the users to choose one or another, the aNobii website suggests to establish a friendship tie with people that you already know in real life, while neighborhood should be used for people that you do not know, but whose library you consider interesting. Except for this usage recommendation, the two types of link have the same characteristics. They are directed, they can be established even without the approval of the linked user, and they enable the notification of the linked library updates. Social aggregation can be achieved also through the affiliation to groups. Thematic groups can be created by any user and the membership is open to anyone. The last channel of interaction is the message wall (also called ‘shoutbox’). Users can write messages on the walls of any other individual, independently of the existence of a relationship in the social network. Self-posting is also allowed. Message exchange defines a different social network that we call communication graph, and whose properties are discussed in detail in Section 4.3. Self-posting is also allowed, yielding self-loops in the communication graph.
We explored the aNobii social networks through web crawling and collected all the public user data through page scraping. We took several snapshots, 15 days apart, using a BFS strategy initialized with a random seed and expanding the user list following the links of the contacts lists and in the shoutboxes. Since social and communications connections are directed we were only able to collect the information of the largest strongly connected component and the out component.a However, we collected the full information of both components, thus avoiding the possible biases related to incomplete sampling of a connected component.
4 Structure and dynamics of social network
4.1 Overview on network structure
Statistics concerning the friendship and neighborhood networks, their union ( i.e. , the full social network) and the communication network in April 2011
Degree centr. 
The separation between the geographical regions in the graph can be quantified by measuring the conductance φ of the graph cut separating the users who reside in a given region R from the rest of the network, and comparing the value with the conductance of a random cut between a region and the rest of the graph, where has the same size and degree distribution than R. The conductance is defined as the ratio between the number of edges crossing the cut and the minimum number of edges inside one of the two regions separated by the cut: small values denote well-separated regions while values close to 1 denote strong connectivity between regions . Italy and Far East regions have a much smaller conductance than their random counterparts (, , , ), while ‘bridge’ regions have a conductance comparable to the random case (, ).
Narrowing down the view on town-level graphs inside clusters, the intra-cluster connections appear denser and structured around a single core of nodes (Figure 1). Of course, since aNobii is focused on books, language is the main reason that leads to this sharp separation.
4.2 Evolution of the network
Evolution of some quantities from one snapshot to the next
New edges existing nodes
Besides triangle closure, another phenomenon that underlies link creation in growing graphs is preferential attachment, i.e. users with large number of connection are preferentially chosen to establish a social link . We test this hypothesis using the following method . Let us denote by the a priori probability for a newcomer to create a link toward a node of degree k, between time and t. Given that at time the degree distribution of the nodes is (i.e., there are nodes of degree k), the probability to observe a new link from a new node to a node of degree k between and t is . Therefore, we can measure by counting for each k the fraction of links created by new nodes that reach nodes of degree k, and dividing by . As shown in Figure 5 (right), we obtain a linear behavior , both when considering for k the in and the out-degree (which are strongly correlated). This is a clear signal of a linear preferential attachment. values for falling far from the diagonal are just statistical noise due to the low number of high-degree nodes.
Clearly, users do not have any knowledge of the overall network topology at any time, so they cannot be more motivated to connect to the most connected users. It is more likely that this preferential attachment arises from the fact that a new user creates links not only towards another user but also towards some of this user’s neighbors. It has been shown that this locally-driven connection pattern results in effective preferential attachment [60, 61]. Indeed, we verified in our dataset that many newcomers join the network by creating links to pairs of already connected users.
4.3 Communication and interaction networks
Ties in social media are most often not categorized based on the intensity or on the type of the connections. However, in a social context, ties might have different strength and meaning, depending on the information that flows on them and from the features that describe the individuals they connect. To reach a deeper understanding of social dynamics, the information on the social connections must be complemented with other relational data. In this respect, the communication network carries a useful information to augment the description of the social substrate as given by the user-declared ‘friendship’ or ‘neighborhood’ ties: some user-declared ties might not be the support of any communication, and communication may occur between users that are neither ‘friends’ nor ‘neighbors’.
The most extensive way in which the communication history between individuals can be defined is through a temporal graph, where each edge corresponds to a single message and carries a timestamp. In this temporal graph, the frequency of messages exchanged by two users might change, with periods of inactivity followed by bursts of messages. The detailed study of this dynamics goes beyond the scope of the present study, so that we consider an aggregation over the whole data set time window, and define the communication graph as a directed graph where each edge between two nodes is weighted by the number of messages sent between these nodes.
Overlap between social networks and communication network
4.4 Topical alignment
where the indicator function is equal to 1 if user x has the book b in his/her library and to 0 otherwise. The cosine similarity is thus a scalar product of the ‘book vectors’ of users u and v, normalized by the library sizes .
The same analysis can be performed on all the features of the users’ profiles. For instance, the relationship between the geographic attributes and the distance on the social graph are explored in the right plots of Figure 7 that show the probability that two users at distance d on the social graph are from the same country or town. Again, to disentangle this signal from statistical effects (given for example by the imbalance of the number of users in each nation) we use as null model a random network with the same degree sequence as the original network but reshuffled geographic attributes. The alignment on the nationality feature is strong up to a distance of 4 hops and a strong effect is observed as well for towns, most of all for directly connected users.
This result suggests that people preferentially establish social ties with others who speak the same language, but also that the social selection process is driven by the geographic proximity (e.g., people that reside in the same town). In particular, 90% of the social edges connect users from the same country and there is a 10% probability that two connected users are from the same city. This result indicates a decreasing trend of the probability of connection with geographic distance, as also found in other online social networks that are not based on a particular interest (here, the books) but have broader scopes [62, 63].
5 Homophily, selection and influence
5.1 Causal connection between similarity and link creation
In Section 4.4 we observed topical alignment as a static property of the network. Here we investigate the evolution of this phenomenon. Since we verified that the topical alignment, which denotes a homophily phenomenon between users , is not purely due to assortative patterns, we can ascribe this phenomenon to selection or to social influence. Selection corresponds to a process in which the choice of a social partner (here as ‘friend’ or ‘neighbor’) that is driven by the similarity between connecting individuals, while social influence  denotes the tendency of individuals to be influenced in their behavior by others, and in particular by their neighborhood in the social network. As we now show, both phenomena can be exposed in aNobii.
Average similarity for snapshot of pairs forming new links between t and (either non-reciprocal, or reciprocal, ), compared with the average similarity of all pairs at distance 2 at t
The picture emerging from this analysis and from the results presented in Sections 4.2 and 4.4 is the following: users connect to others residing close in the social graph, very often neighbors of neighbors; moreover, these individuals have on average more similar profiles than other pairs of users at distance 2. In this respect, one can infer that a selection process is at work and is one of the reasons of the observed local topical alignment: among the users who are already close in the graph (distance 2 and 3), the ones who become even closer are the ones who were more similar to each other.
Note that the similarity metric used is symmetrical, therefore it does not account for the directionality of the newly created link. We decided not to consider the link directionality in the computation of the similarity, as close to 50% of the newly created links are bidirectional (see Table 2), and because a user receiving a new incoming connection is notified about it on his/her personal homepage: the influence at the time of the connection can potentially flow in both directions.
To summarize, our analysis on the dynamics of social aggregations show the presence of a bidirectional causal relationship between social connections and similarity. A higher similarity leads to a higher connection probability and, on the other hand, users who get connected become more similar due to the influence that new acquaintances exert on one another. These results apply not only for collaboration networks , but also for the present case of interest-based networks such as aNobii, where the similarity between users is evaluated on the basis of profile items, shared metadata, and topics of interest.
5.2 Structure of book graphs
Influence can also be investigated from a different angle, focusing on items rather than on users. The influence observed at the time of a link creation might indeed remain effective for the whole life span of the social link, and, at any time, may lead a user to adopt a new item (in particular a book) from his/her neighbors and, in turn, to influence others to adopt the same item. This phenomenon gives origin to adoption cascades that can be studied within the more general scope of information spreading . Better understanding the spreading of items on the network can shed a clearer light on the overall role of influence in the online social network.
In this perspective, we study the static and dynamic properties of the book graphs: a book graph is defined as the social subgraph composed by the users having the book b in their library or wishlist and by the links between them. We differentiate the analysis by classes of book popularity as measured by the size of the set of the users who adopted the book b (i.e., the nodes in ). In particular, we introduce three popularity classes, namely the rare (), the middle (), and the popular () books. The boundaries of the popularity classes are chosen based on the empirical observation of the popularity distribution of books. Even neglecting very rare books with less than 10 readers we have more than 200K book graphs.
The first is an Erdős-Rényi graph with the same number of nodes and edges. The latter is a random subgraph of the social network with the same number of nodes and the same degree sequence. The purpose of the random subgraph is to model a process in which the book is adopted by the different users at random and independently. If such a process is considered by simply selecting nodes at random in the network, the resulting subgraphs will be almost always composed of isolated nodes or small disconnected components, therefore we impose that the resulting subgraph has the same degree sequence as the original subgraph.
Book graphs exhibit a weaker connectivity but a much more clustered shape than the corresponding ER graphs, at fixed size. The relative number of connected components slowly decreases with size but remains considerably higher than the ER corresponding values; as a consequence, the relative size of the greatest component asymptotically stabilizes around a value smaller than in the ER graphs; conversely, real book graphs are much more clustered. Structural properties of the random-node-graphs are closer to those of the real book graphs, meaning that the measured levels of clustering and connectivity of the book graphs can partly be ascribed to the degree distribution of their nodes. Nevertheless, the random-node model still exhibits lower clustering and higher sparsity than the empirical book graphs.
The case of ER graphs is the simplest to interpret and is used as a reference for the other two cases. For ER graphs we observe a relatively rapid transition from 0 to 1 for as crosses 1, which is expected given the known transition between a set of small disconnected components and a giant connected component as the probability of connection crosses . Instead, the clustering values remain very small (as also expected in ER graphs). In the case of the real book graphs, the size of the greatest component grows smoothly with the average degree, showing no sign of any abrupt transition, suggesting that the connectivity in book graphs is not driven by any threshold mechanism driven by the average node connectivity. Furthermore, for any average connectivity, a non-negligible portion of nodes remains in small isolated components. This can be due to the fact that several users adopt a book independently, without being directly influenced by their online social contacts. However, the clustering coefficient is very large, suggesting that the groups of adopters are tightly knit communities. The random-node model follows the same trend as the real data, but both the size of the largest component and the clustering are lower, showing that the connectivity patterns are not completely due to the degree distribution. For books with large popularity, the empirical data and the random-node case become closer.
The overall picture tends to indicate that book graphs may be originated by a process of expansion and densification of clustered cores of readers, and that a process of ‘contagion’ between users might have taken place in the shaping of the subgraphs of adopters . Nevertheless, as the book popularity grows such effect fades, presumably because the adoption of a very popular book is not mainly driven by inputs received within the social network, but can be in large part driven by stimuli and mechanisms external to the online social network. As the correlations shown here correspond to static snapshots, they cannot however be used to infer causality relations between connectivity and book adoption. It is therefore also possible that the structure of the book graphs is due to the fact that people sharing the same rare book are more likely to establish social contacts than people sharing a very common book.
5.3 Spreading of books
To better understand if a user might be led to adopt a book through the influence of his/her social neighborhood, it is necessary to analyze the temporal evolution of the graphs. We call the social subgraph of users having book b at time t. can evolve because of new users arriving in the social network who have b in their library, users leaving, or users adding/removing b to/from their library. For the purpose of detecting influence patterns, we disregard the newcomers (who might or not fill their own library with the books they have read) and users leaving the network, and focus on the graph restricted to the users who are present in all the considered snapshots. Moreover, for simplicity, we neglect the (very rare) events of book deletion: once a book is adopted by a user, we assume that it is present in his/her library at any future time. In this context, we formally define the set of adopters of a book b between time and t as .
For every adopter, we measure the fraction of users that could potentially have played an influence in the book adoption process. If a book is adopted in the time span , the users that may have influenced the adopter are her out-neighbors who already have that book in their library at time t.b We specifically focus only on the out-neighbors because users are explicitly notified of their new book adoptions, while a user may not be aware of the activity of his/her in-neighbors. Consequently, we denote the number of user u’s out-neighbors at time t having book b as and the fraction of such users over all u’s out-neighbors as .
Interestingly, the vast majority (74%) of adopters with exhibit values smaller than 0.2, and the average value of for these adopters is rather small (0.189); on the other hand, the numbers of neighbors of an adopter who already have the book b are broadly distributed. This could support two distinct hypothesis: the first one is that only a rather small number of neighbors are really influential among the neighborhood of a user; the second is that the important criterion in the adoption of a book (an ‘influence threshold’) is not the bare number of neighbors who have adopted a book, but the corresponding fraction among all out-neighbors, and that the influence threshold in such context is rather low.
5.4 Influence factors
As previously mentioned, users are notified of the adoption of a book by their out-neighbors: information flows in an automated way along the friendship and neighborhood links. It is thus interesting to compare the potential existence of influence effects in the book adoption process along the social links that do not support additional (non automated) communication between the users () with respect to the case of social links that do (). To this aim, we compute the probability of adoption at time t of a book b given a fixed number of neighbors who already have b at time , formally: with , where is the set of out-neighbors of u.
The computation of for the pure social network must use out-neighbors because the information (i.e., automatic notifications) flows against the direction of the edges. In the interaction network instead, both directions should be taken into account because a message sent from u to v may imply a particular interest of u in v’s library or, conversely, that u is proactively suggesting a book to v. For this reason in the interaction network we consider two separate cases where is computed considering the set of in-neighbors or out-neighbors .
Interesting features emerge: (i) the probability of adoption is very small if (less than ), and increases very rapidly as the number of out-neighbors having the considered book at increase; (ii) this probability tends to saturate as increases above 20, showing that an additional increase in the number of out-neighbors reading the book do not increase the user’s adoption probability; (iii) the probability of adoption at fixed number of out-neighbors reading the book is much larger for out-neighbors with whom an explicit communication is established; (iv) when focusing on interaction ties, receiving messages from a certain number of early adopters of a book b implies a higher probability of adoption of b than sending messages to the same number of owners of b.
The first result is a strong indication in favor of the hypothesis of effective influence between neighbors on the social graph. The second indicates that the number of influential neighbors is limited, in support of the first hypothesis outlined above. The third result supports a scenario in which direct suggestions from neighbors with whom an explicit communication exists have a stronger influencing power than the automated notification system and, in particular, the fourth result suggests that adoption is at least partially triggered by direct recommendations received by earlier adopters.
6 Recommending social contacts
The analysis reported in the previous sections sheds light on the dynamics of link creation in social media. Understanding the processes behind the creation of social connections allows to infer some model of network growth that can be exploited to predict the evolution of the system. In this section we will use the acquired knowledge of network dynamics to predict the creation of new links. More specifically, we propose a methodology for personalized contact recommendation that could be directly implemented on any social media like aNobii.
6.1 Prediction features
The task of predicting user pairs that will be connected in the future by a social tie can rely on two main sources of information: the structural features of the graph and the features from the user profiles. We use both types of features, considering the three main evolutionary patterns of the social graph that we previously detected.
1. Proximity-driven link creation. In the vast majority of cases, new neighbors are chosen among the nodes at distance 2 (i.e., closing triangles) or 3 in the social graph. Restricting the analysis to pairs that reside near in the graph may miss some potential new connections but dramatically lowers the time needed by practical algorithms for partner recommendation.
2. Strong interaction links. Users are influenced and inspired more by the social contacts with whom they carry out a regular communication. Taking into account the strength of the interaction links rather than (or in addition to) pure social ties could improve the prediction.
3. Homophily-driven attachment. Users create new connections preferentially with their most similar acquaintances. Similarity is a notion that involves all the different facets of the user profile (from geographic location to favorite books). Pairs of more similar users should therefore be considered as more likely candidates for a link creation.
List of features used in the prediction of a directed link between generic users u and v , along with their description
Binary attribute, whether u and v belong to the same city
Binary attribute, whether u and v belong to the same gender
Absolute difference of ages
Cosine similarity between library vectors
Cosine similarity between group membership vectors
Size of the smallest group the two users have in common
Cosine similarity between sets of tags used
Cosine similarity of the vectors of social contacts
Sum of the out degrees ()
Product of the out degrees ()
Number of common neighbors, directed case ()
Binary attribute, whether the inverse link (v,u) is already present
Linear combination of common neighbors and common distance-2 neighbors () 
6.2 Classifier training and feature selection
Features can be combined through a supervised machine learning approach. A classifier properly trained on the mentioned features can determine, given any pair of nodes, if they are likely to create a social link between each other in the future. By knowing in advance the user pairs with higher connection probability, social contact recommendations can be sent to the endpoints, with the aim of notifying the two endpoints of the possibility of establishing a potentially interesting social connection that they may not have noticed otherwise or at least to speed up the linking process between them. We follow this approach and we discuss its effectiveness in a link recommendation scenario.
We choose to use a Rotation Tree classifier  that turned out, a posteriori, to be the best performing among all WEKA’s  classifiers, and we train it with all the available features. The positive sample of the training set is built by about 10k pairs of users who reside at distance 2 on the social graph at the time of snapshot 1 and get connected before snapshot 6. The negative sample is given by as many pairs residing 2 hops away at snapshot 1 and that do not become connected. We consider only distance-2 neighbors because in the link recommendation task we will restrict our prediction to the closest non-connected pairs for computational efficiency reasons. Note that taking into account only distance-2 pairs makes the prediction task harder than selecting the non-connected pairs at random; this is due to the fact that the distribution of similarity values of pairs of users lying at distance 2 on the graph are more similar between positive and negative samples than for pairs of users taken at random (and hence farther away on the network with high probability).
Prediction performance on the training set using the Rotation Forest classifier, 10-fold cross validation, with balanced positive and negative samples (10,000 examples)
By only using the top 9 features we verify that the prediction accuracy remains very stable and the False Positive rate is even slightly lower than with the full feature set (Table 6). We therefore retrain the classifier using the restricted feature set and use such classifier as the fundamental building block of our social contact recommender, described in the next subsection.
6.3 Contact recommendation
A contact recommendation service should be able to provide suggestions in real-time and on demand. Screening all the users that are not connected with the client requires a too high computational effort to meet this requirement. Therefore, we adopt a local search limited to the distance-2 neighborhood of the target user; among those potential contacts, the system outputs a fixed number N of suggestions.
To evaluate the effectiveness of this approach we build a test set of active users who established at least 20 new social ties between snapshots 1 and 6 with people who reside at distance 2 from them at snapshot 1. For each user u among such set, we apply our classifier to every pair and, from the set of pairs labeled positively by the classifier, we select N contacts to compose the recommendation list. The list is sorted according to the confidence score given by the classifier for each prediction. The number of actual ties created by the sampled users between time 1 and 6 is around 3k, while the number of potential ties that could have been established by these users towards distance-2 neighbors is higher than 650k. The goal of the classifier is to identify the 3k correct pairs among the 650k possible, with the lowest number of misclassifications. Such huge disproportion of positives and negatives instances determines a very high sparsity of the problem (density is less than 0.005), thus making the recommendation task particularly hard to solve with high accuracy.
To investigate the causes of such modest performance, we compare the obtained results with another attempt of tackling the link prediction problem from a recommendation perspective made in the Facebook social network . The evaluation of the recommendation is very similar to ours with respect to the size of the network sample, the time span of the prediction and the activity of the target users. Among all the experiments that authors report, recommendation through logistic regression combining several structural graph features compares well to our approach. Nevertheless the number of correct recommendations is higher than in the aNobii case (correct recommendations at 20 is around 7.50 against ours 1.50). The main reason is due to the different sparsity of the problem. Specifically:
• In the same time span, the average number of new contacts per user in Facebook is more than six times larger than in aNobii (26 new links in Facebook vs. 4 in aNobii);
• The portion of new contacts residing at distance larger than 2 in aNobii is around 0.4, while in the Facebook dataset it is negligible;
• Contrary to Facebook, the aNobii network is directed and the predictions must take into account the directionality of the edge.
In Facebook, users are much more active and faster in establishing new contacts and they focus much more on their distance-2 neighbors, thus increasing the number of potential true positives over the total number of potential new contacts. Nevertheless, we underline that even in aNobii’s more challenging setting the relative improvement of machine learning combination of different profile and structural features over the performance of common neighbors is comparable to the improvement obtained in the case of Facebook by previous work.
In short, the difference between the two cases can be summarized as follows. In Facebook, the decision of link creation among two people depends largely on the fact that the two endpoints have a social connection in the offline world, so that the decisional process to determine whether to add a new contact or not can be fast and simple. Conversely, in social networks with a stronger emphasis on topical interests, the items shared are more important than the personal user features (especially for neighborhood links that relate individuals who do not know each other a priori) and they are the main driver for the establishment of new social connections. The creation of links in such an interest network is therefore determined by the complex cognitive processes needed to relate multifaceted objects like books. This implies also a slower pace in such decisional process. Reaching definitive conclusions on this matter would require an extensive comparison between social media with different scopes (e.g., music, news, photos), we believe our study can represent a contribution in this direction.
7 Discussion and conclusions
Link creation and influence are the processes on which most of the dynamics of online social media are based. In this work, we have characterized such phenomena in the case of aNobii, a network of interest for book lovers.
We have found that link formation has a strong propensity to topical and structural selection effects, reciprocity, and proximity-driven attachment. Based on these observations, we have collected a large number of both novel and state-of-the-art metrics that have a potential in predicting the formation of new links. Among such features, ranging from topical (e.g., similarity between items owned by two users) to structural ones (e.g., estimation of the amount of information potentially flowing from one person to the other via social links), we have detected the most predictive, thus shedding some light on the relative effectiveness of the main features that have been used in past work on link prediction. We have combined the best features into a classifier able to output a prediction about the future creation of a connection between any pair of nodes in the social network. We have used such classifier to produce recommendations of new social contacts for users. Differently from link prediction, that aims at predicting the global evolution of the network, link recommendation provides a contact list for every single individual and succeeds when many of the recommended contacts are actually linked by the target user. Such task is still widely unexplored and has been attempted only on general-purpose social networks with a strong accent on the user profile (e.g., Facebook) rather than in interest networks like aNobii. The classifier considerably improves accuracy over simple yet very strong baselines, but the obtained performance is lower than the one reported for general-purpose online social media in previous work. The reasons for this gap likely reside in the different nature of the two cases. While in profile-focused services social aggregation is often based on the existence of a relation in the real world, that can be detected easily with simple metrics (e.g., number of common friends), in interest-based networks the creation of new links is driven by cognitive processes needed to evaluate the topical interest in one profile rather than on another, that are more difficult to capture and anticipate. This finding opens the way to the exploration of the potential of prediction and recommendation in social platform with different topical focuses.
Investigation of influence complements the study on link creation. Unlike previous work, we investigate influence from both user and item perspectives. From the user side, we support with strong evidences the thesis that similarity patterns that are detected in the static network are also determined by the influence that connected users exert on each other. In particular, we observe that link creation triggers a noticeable sudden increase in the similarity between the endpoints, particularly in terms of books adopted. We inspect patterns of book adoption by modeling graphs of book spreading in time and comparing them with null models to point out their clustered and expanding nature. Based on this model, we find that the fraction of neighboring users that may have influenced an adopter is on average rather small, that the probability of adopting a book saturates as the number of neighbors already having that book increases and that the probability to adopt a book in function of the number of earlier adopters in the social neighborhood is higher if explicit communication channels exist with these neighbors. By adopting a prediction perspective, we also shed some light on the question about the fraction or the absolute amount of earlier adopter neighbors being the best indicator of higher probability of adoption, and we find that the absolute number is by far more predictive of a future adoption (even if accurate spreading prediction remains a difficult task due to the extreme sparsity of the problem and to external unobservable factors determining adoption). All these results support the idea that the ‘information contagion’ is a slow but relevant phenomenon in the social network and that it is usually triggered by a small number of influential users.
Another finding involves the analysis of the interaction network. For both link creation and information spreading, the interaction network has an important role in determining new connections and preferential channels of item diffusion. Many previous work showed that communication graph conveys a much stronger social signal than the pure social graph, but the implication of such stronger connections on sociological phenomena like homophily and influence had not been investigated directly before.
This work opens several natural research directions. Among possible research lines we mention the development of a model of spreading that relies on some user metadata other than the topology of the network and that could fit the phenomenon of book spreading we observed. A more thorough exploration of the possibility of predicting item spreading in contexts with slow content consumption like aNobii is also an interesting possible future extension and may open up the way to new item recommendation techniques.
Strictly speaking, it is impossible to prove that our crawls reached effectively the largest component. Given its characteristics and size, which are in agreement with known properties of the aNobii social system, it is however a reasonable assumption.
We disregard here the possibility of interactions between users taking place outside the social network. It is clear that what can be inferred from the analysis of the online social network are only tendencies and indications, and that no absolute proof of influence effects can be obtained, as one cannot rule out effects external to the network.
This work has been partially supported by the Italian Ministry for University and Research (MIUR), within the framework of the project ‘Information Dynamics in Complex Data Structures’ (PRIN). We acknowledge support from the Lagrange Project of the ISI Foundation supported by the CRT Foundation.
- Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B: Measurement and analysis of online social networks. In IMC ’07: proceedings of the 7th ACM SIGCOMM conference on Internet measurement. ACM, New York; 2007:29–42.View Article
- Kumar R, Novak J, Tomkins A: Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06. ACM, New York; 2006:611–617. http://doi.acm.org/10.1145/1150402.1150476 http://doi.acm.org/10.1145/1150402.1150476View Article
- Leskovec J, Backstrom L, Kumar R, Tomkins A: Microscopic evolution of social networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08. ACM, New York; 2008:462–470. http://doi.acm.org/10.1145/1401890.1401948 http://doi.acm.org/10.1145/1401890.1401948View Article
- Ahn YY, Han S, Kwak H, Moon S, Jeong H: Analysis of topological characteristics of huge online social networking services. In WWW ’07: proceedings of the 16th international conference on World Wide Web. ACM, New York; 2007:835–844.View Article
- Aiello LM, Barrat A, Cattuto C, Ruffo G, Schifanella R: Link creation and profile alignment in the aNobii social network. In SocialCom ’10: proceedings of the second IEEE international conference on social computing. IEEE Press, Minneapolis; 2010:249–256.View Article
- Mislove A, Koppula HS, Gummadi KP, Druschel P, Bhattacharjee B: Growth of the Flickr social network. In WOSN ’08: proceedings of the first workshop on online social networks. ACM, New York; 2008:25–30.View Article
- Lauterbach D, Truong H, Shah T, Adamic L: Surfing a web of trust: reputation and reciprocity on CouchSurfing.com.4. Computational science and engineering, IEEE international conference on 2009, 346–353.
- Weng J, Lim EP, Jiang J, He Q: TwitterRank: finding topic-sensitive influential twitterers. In Proceedings of the third ACM international conference on web search and data mining, WSDM ’10. ACM, New York; 2010:261–270. http://doi.acm.org/10.1145/1718487.1718520View Article
- Aiello LM, Barrat A, Schifanella R, Cattuto C, Markines B, Menczer F: Friendship prediction and homophily in social media. ACM Trans Web 2012., 6: Article ID 9 Article ID 9
- Chun H, Kwak H, Eom YH, Ahn YY, Moon S, Jeong H: Comparison of online social relations in volume vs. interaction: a case study of cyworld. In Proceedings of the 8th ACM SIGCOMM conference on Internet measurement, IMC ’08. ACM, New York; 2008:57–70. http://doi.acm.org/10.1145/1452520.1452528View Article
- Wilson C, Boe B, Sala A, Puttaswamy KP, Zhao BY: User interactions in social networks and their implications. In Proceedings of the 4th ACM European conference on computer systems, EuroSys ’09. ACM, New York; 2009:205–218. http://doi.acm.org/10.1145/1519065.1519089 http://doi.acm.org/10.1145/1519065.1519089View Article
- Leskovec J, Horvitz E: Planetary-scale views on a large instant-messaging network. In Proceedings of the 17th international conference on World Wide Web, WWW ’08. ACM, New York; 2008:915–924. http://doi.acm.org/10.1145/1367497.1367620 http://doi.acm.org/10.1145/1367497.1367620View Article
- Viswanath B, Mislove A, Cha M, Gummadi KP: On the evolution of user interaction in Facebook. In WOSN ’09: proceedings of the 2nd ACM workshop on online social networks. ACM, New York; 2009:37–42.View Article
- Benevenuto F, Rodrigues T, Cha M, Almeida V: Characterizing user behavior in online social networks. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, IMC ’09. ACM, New York; 2009:49–62. http://doi.acm.org/10.1145/1644893.1644900View Article
- Gonçalves B, Perra N, Vespignani A: Modeling users’ activity on Twitter networks: validation of Dunbar’s number. PLoS ONE 2011.,6(8): Article ID e22656. http://dx.doi.org/10.1371%2Fjournal.pone.0022656View Article
- Kempe D, Kleinberg J, Tardos E: Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03. ACM, New York; 2003:137–146. http://doi.acm.org/10.1145/956750.956769View Article
- Cha M, Mislove A, Gummadi KP: A measurement-driven analysis of information propagation in the Flickr social network. In Proceedings of the 18th international conference on World Wide Web, WWW ’09. ACM, New York; 2009:721–730. http://doi.acm.org/10.1145/1526709.1526806View Article
- Ye S, Wu SF: Measuring message propagation and social influence on Twitter.com. In Proceedings of the second international conference on social informatics, SocInfo ’10. Springer, Berlin; 2010:216–231. http://portal.acm.org/citation.cfm?id=1929326.1929342
- Cha M, Haddadi H, Benevenuto F, Gummadi KP: Measuring user influence in Twitter: the million follower fallacy. ICSWM ’10: proceedings of the 4th international AAAI conference on weblogs and social media 2010.
- Anagnostopoulos A, Kumar R, Mahdian M: Influence and correlation in social networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08. ACM, New York; 2008:7–15. http://doi.acm.org/10.1145/1401890.1401897View Article
- Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S: Feedback effects between similarity and social influence in online communities. In KDD ’08: proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York; 2008:160–168.View Article
- Bakshy E, Rosenn I, Marlow C, Adamic L: The role of social networks in information diffusion. In Proceedings of the 21st international conference on World Wide Web, WWW ’12. ACM, New York; 2012:519–528. http://doi.acm.org/10.1145/2187836.2187907View Article
- Yang J, Leskovec J: Modeling information diffusion in implicit networks. In Proceedings of the 2010 IEEE international conference on data mining, ICDM ’10. IEEE Computer Society, Washington; 2010:599–608. http://dx.doi.org/10.1109/ICDM.2010.22View Article
- Gomez Rodriguez M, Leskovec J, Krause A: Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York; 2010:1019–1028. http://doi.acm.org/10.1145/1835804.1835933View Article
- Au Yeung CM, Iwata T: Capturing implicit user influence in online social sharing. In Proceedings of the 21st ACM conference on hypertext and hypermedia, HT ’10. ACM, New York; 2010:245–254. http://doi.acm.org/10.1145/1810617.1810662View Article
- Getoor L, Diehl CP: Link mining: a survey. ACM SIGKDD Explor Newsl 2005,7(2):3–12. 10.1145/1117454.1117456View Article
- Cooke RJE (2006) Link prediction and link detection in sequences of large social networks using temporal and local metrics. Master thesis, Department of Computer Science, University of Cape Town
- Backstrom L, Leskovec J: Supervised random walks: predicting and recommending links in social networks. In Proceedings of the fourth ACM international conference on web search and data mining, WSDM ’11. ACM, New York; 2011:635–644. http://doi.acm.org/10.1145/1935826.1935914View Article
- Liben-Nowell D, Kleinberg J: The link prediction problem for social networks. In CIKM ’03: proceedings of the twelfth international conference on information and knowledge management. ACM, New York; 2003:556–559.View Article
- Liben-Nowell D, Kleinberg J: The link-prediction problem for social networks. J Am Soc Inf Sci Technol 2007,58(7):1019–1031. 10.1002/asi.20591View Article
- Pavlov M, Ichise R: Finding experts by link prediction in co-authorship networks. FEWS2007: proceedings of the workshop on finding experts on the web with semantics at ISWC/ASWC2007 2007.
- Popescul A, Popescul R, Ungar LH: Structural logistic regression for link analysis. Proceedings of the second international workshop on multirelational data mining 2003.
- Hasan MA, Chaoji V, Salem S, Zaki M: Link prediction using supervised learning. Proceedings of SDM ’06 workshop on link analysis, counterterrorism and security 2006.
- O’Madadhain J, Hutchins J, Smyth P: Prediction and ranking algorithms for event-based network data. ACM SIGKDD Explor Newsl 2005,7(2):23–30. 10.1145/1117454.1117458View Article
- Zheleva E, Getoor L, Golbeck J, Kuter U: Using friendship ties and family circles for link prediction. In 2nd SNA-KDD workshop on social network mining and analysis. ACM, Las Vegas; 2008.
- Sachan M, Ichise R: Using semantic information to improve link prediction results in networked datasets. Int J Eng Technol 2010,2(4):334–339.View Article
- Zhou T, Lü L, Zhang YC: Predicting missing links via local information. Eur Phys J B 2009,71(4):623–630. Special issue: The physics approach to risk: agent-based models and networksMATHView Article
- Lü L, Ci-Hang J, Zhou T (2009) Effective and efficient similarity index for link prediction of complex networks. arXiv:0905.3558
- Song HH, Cho TW, Dave V, Zhang Y, Qiu L: Scalable proximity estimation and link prediction in online social networks. In IMC ’09: proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference. ACM, New York; 2009:322–335.View Article
- Schifanella R, Barrat A, Cattuto C, Markines B, Menczer F: Folks in folksonomies: social link prediction from shared metadata. In WSDM ’10: proceedings of the third ACM international conference on web search and data mining. ACM, New York; 2010:271–280.View Article
- Leroy V, Cambazoglu BB, Bonchi F: Cold start link prediction. In SIGKDD ’10: proceedings of the 16th ACM conference on knowledge discovery and data mining. ACM, Washington; 2010.
- Bilgic M, Namata GM, Getoor L: Combining collective classification and link prediction. In ICDMW ’07: proceedings of the seventh IEEE international conference on data mining workshops. IEEE Computer Society, Washington; 2007:381–386.View Article
- Tylenda T, Angelova R, Bedathur S: Towards time-aware link prediction in evolving social networks. In SNA-KDD ’09: proceedings of the 3rd workshop on social network mining and analysis. ACM, New York; 2009:1–10.View Article
- Dunlavy DM, Kolda GK, Acar E (2010) Temporal link prediction using matrix and tensor factorizations. arXiv:1005.4006
- Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D, Kaski K, Kertész J, Barabási AL: Structure and tie strengths in mobile communication networks. Proc Natl Acad Sci USA 2007,104(18):7332–7336. http://dx.doi.org/10.1073/pnas.0610245104 View Article
- Lü L, Zhou T: Role of weak ties in link prediction of complex networks. In CNIKM ’09: proceedings of the 1st ACM international workshop on complex networks meet information and knowledge management. ACM, New York; 2009:55–58.View Article
- Gilbert E, Karahalios K: Predicting tie strength with social media. In Proceedings of the 27th international conference on human factors in computing systems, CHI ’09. ACM, New York; 2009:211–220. http://doi.acm.org/10.1145/1518701.1518736 http://doi.acm.org/10.1145/1518701.1518736View Article
- Benchettara N, Kanawati R, Rouveirol C: Supervised machine learning applied to link prediction in bipartite social networks. In Social network analysis and mining, international conference on advances in. IEEE Computer Society, Los Alamitos; 2010:326–330.View Article
- Kunegis J, De Luca E, Albayrak S: The link prediction problem in bipartite networks.Lecture notes in computer science 6178. In Computational intelligence for knowledge-based systems design. Edited by: Hullermeier E, Kruse R, Hoffmann F. Springer, Berlin; 2010:380–389.View Article
- Leskovec J, Huttenlocher D, Kleinberg J: Predicting positive and negative links in online social networks. In WWW ’10: proceedings of the 19th international conference on World Wide Web. ACM, New York; 2010:641–650.View Article
- Taskar B, Wong MF, Abbeel P, Koller D: Link prediction in relational data. NIPS ’03: neural information processing systems conference 2003.
- Getoor L, Friedman N, Koller D, Taskar B: Learning probabilistic models of link structure. J Mach Learn Res 2003, 3: 679–707.MATHMathSciNet
- Freeman LC: Centrality in social networks: conceptual clarification. Soc Netw 1979,1(3):215–239. http://dx.doi.org/10.1016/0378–8733(78)90021–7View Article
- Wasserman S, Faust K: Social network analysis: methods and applications. Cambridge University Press, Cambridge; 1994.View Article
- Bollobas B: Modern graph theory. Springer, Berlin; 1998.MATHView Article
- Marlow C, Naaman M, Boyd D, Davis M: HT06, tagging paper, taxonomy, Flickr, academic article, to read. In HYPERTEXT ’06: proceedings of the seventeenth conference on hypertext and hypermedia. ACM, New York; 2006:31–40.View Article
- Newman MEJ: Assortative mixing in networks. Phys Rev Lett 2002., 89:208701
- Albert R, Barabási AL: Statistical mechanics of complex networks. Rev Mod Phys 2002, 74: 47–97. 10.1103/RevModPhys.74.47MATHView Article
- Newman MEJ: Clustering and preferential attachment in growing networks. Phys Rev E 2001.,64(2):025102
- Kleinberg JM, Kumar R, Raghavan P, Rajagopalan S, Tomkins AS: The web as a graph: measurements, models and methods.Lecture notes in computer science 1627. Computing and combinatorics 1999, 1–18.View Article
- Kumar R, Raghavan P, Rajagopalan S, Sivakumar D, Tomkins A, Upfal E: Stochastic models for the web graph. Proceedings of the 41th IEEE symposium on foundations of computer science (FOCS) 2000, 57–65.View Article
- Liben-Nowell D, Novak J, Kumar R, Raghavan P, Tomkins A: Geographic routing in social networks. Proc Natl Acad Sci USA 2005,102(33):11623–11628. 10.1073/pnas.0503018102View Article
- Lee C, Scherngell T, Barber MJ (2009) Real-world separation effects in an online social network. Technical report. http://arxiv.org/abs/0911.1229
- McPherson M, Lovin LS, Cook JM: Birds of a feather: homophily in social networks. Annu Rev Sociol 2001, 27: 415–444. http://dx.doi.org/10.1146/annurev.soc.27.1.415View Article
- Barrat A, Barthlemy M, Vespignani A: Dynamical processes on complex networks. 1st edition. Cambridge University Press, New York; 2008.MATHView Article
- Opsahl T, Agneessens F, Skvoretz J: Node centrality in weighted networks: generalizing degree and shortest paths. Soc Netw 2010,32(3):245–251. 10.1016/j.socnet.2010.03.006View Article
- Rodriguez JJ, Kuncheva LI, Alonso CJ: Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 2006,28(10):1619–1630.View Article
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 2009, 11: 10–18. 10.1145/1656274.1656278View Article
- Liu H, Setiono R: Chi2: feature selection and discretization of numeric attributes. In Proceedings of the seventh international conference on tools with artificial intelligence, TAI ’95. IEEE Computer Society, Washington; 1995:388–391.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.