Characterization of online groups along space, time, and social dimensions
© Martin-Borregon et al.; licensee Springer 2014
Received: 5 March 2014
Accepted: 18 July 2014
Published: 24 September 2014
Social groups play a crucial role in online social media because they form the basis for user participation and engagement. Although widely studied in their static and evolutionary aspects, no much attention has been devoted to the exploration of the nature of groups. In fact, groups can originate from different aggregation processes that may be determined by several orthogonal factors. A key question in this scenario is whether it is possible to identify the different types of groups that emerge spontaneously in online social media and how they differ. We propose a general framework for the characterization of groups along the geographical, temporal, and socio-topical dimensions and we apply it on a very large dataset from Flickr. In particular, we define a new metric to account for geographic dispersion, we use a clustering approach on activity traces to extract classes of different temporal footprints, and we transpose the “common identity and common bond” theory into metrics to identify the skew of a group towards sociality or topicality. We directly validate the predictions of the sociological theory showing that the metrics are able to forecast with high accuracy the group type when compared to a human-generated ground truth. Last, we frame our contribution into a wider context by putting in relation different types of groups with communities detected algorithmically on the social graph and by showing the effect that the group type might have on processes of information diffusion. Results support the intuition that a more nuanced description of groups could improve not only the understanding of the activity of the user base but also the interpretation of other phenomena occurring on social graphs.
The explosive success of social media is partly motivated by their capability of transposing everyday life dynamics on online platforms in a very intuitive way. Accordingly, even though dyadic social links are the primary way for people to connect online, social media have allowed from their very early stages the creation of social groups. This is a necessity that emerges directly from the collective behaviour of the crowd, that tends to flock in communities pushed by a number of reasons, including affiliation by similarity, local proximity, common interest, conflict with other groups, or even just the need for a definition of an identity by being separated by the rest of the population –. As a result, groups in social media have flourished and they nowadays form a strong basis for user participation and engagement in online services.
For this reasons, online groups have been studied extensively in the past, with respect to their social structure and activity evolution. Despite the great attention given to the study of online groups, previous work on large online datasets has mainly considered groups as homogeneous entities, overlooking the fact that groups, similarly to social ties , , are not all created equal, as they emerge from different collective processes and from the different motivations of their founders or members.
Although several other disciplines, including physics, psychology, organizational sciences, and social sciences have been trying to explore specific aspects of the formation, evolution, and internal dynamics of groups at different levels (see Section 2), most of the studies has focused either on (i) small or offline social ecosystems, (ii) groups that are generated ad-hoc to conduct specific experiments, or (iii) groups inferred from the network structure. Also, very often specific aspects of group dynamics (e.g., consensus reaching, language norms, geographic placement) have been investigated in separation, with very few efforts to go towards a more holistic, multidimensional characterization of social aggregations. As a consequence, we feel that a thorough and large-scale exploration of the nature of online, user-generated groups, across some fundamental dimensions that characterize groups is in order.
We propose a categorization of online groups along three axes: spatial, temporal and socio-topical. For each dimension we propose a set of general metrics that capture quantitatively the different facets of groups. Specifically, we describe groups with respect of the geographic scattering of their members, the temporal footprint of the members’ activity in terms of dispersion, skeweness, and burstiness, and the tendency of the group to aggregate members on a topical or social basis. With respect to the last dimension, we rely on a longstanding theory about the creation of social communities. The theory states that people join groups driven by either pre-existing social ties with other members or by the interest in the topical focus of the group as a whole and we build metrics to quantify this tendency. We show that our metric well reflect the cardinal points of the theory, being good predictors of the group type. Our metrics are tested on a large-scale corpus of public, online, user-generated groups.
Last, to frame our contribution in to a wider context, we provide examples of possible applications of our framework to other analytical issues on social networks. In particular, we put in relation the social and topical groups we find algorithmically with the communities detected from the graph structure and we speculate about the impact that different group types may have in the process of information diffusion, following the intuition that information cascades and group boundaries are strictly related concepts .
1.1 The Flickr case-study
We test our group characterization framework on a large scale set of online groups from Flickr (www.flickr.com). Flickr is a popular photo-sharing platform in which users can upload a large amount (up to 1 TB) of photos, organize them in albums or with free-form textual tags. Flickr provides means of rich social interactions between users. First, photos are shown in the user profile page and other users can view them, comment on them or mark them as favorites. Also, users can establish explicit social ties by following people they are interested in, to receive their status updates. Last, a pivotal part of the community engagement in Flickr is represented by groups.
Flickr groups represent an ideal ecosystem for the study of group characterization for a number of reasons. Groups in Flickr are large scale (hundreds of thousands of public groups, with a broad range in membership size), spontaneously generated (in contrast with groups inferred by the structure of the social network or created ad hoc for specific experiments), and exhibit public online information that is rich both in terms of content (photos, tags) and social information (multiple types of interactions between members). This combination of features is ideal to investigate the factors that drive the collective interaction between people in social aggregations. It is very difficult to find other large-scale, publicly accessible datasets with a similarly wide and diverse set of features. For these reasons, we focus our study on Flickr only, diving deep in several aspects of the groups’ structure and organization rather than proposing a wider multi-dataset exploration.
1.2 Contributions and roadmap
This work is a direct extension to our previous paper  that focused on the interplay between social and topical aspects of online communities. Here we extend and improve that work here in a number of ways, and present the following contributions.
We introduce a framework for the characterization of groups along geographical and temporal dimensions.
We run a study of a large scale corpus of Flickr groups using the three target dimensions, being able to draw a more nuanced characterization of them than previous work.
We use our framework to run a faceted analysis of the phenomenon of information diffusion on networks, spotting insightful correlations between type of spreading and type of group.
Overall, our work gives a contribution in the first place in the field of computational social science, specifically in the direction of a nuanced characterization of groups according to notions of topicality and sociality developed in sociology in the past decades but never tested on large online datasets. Our experimental evaluation shows that the formulation of the theory well captures the separation between the two macro-classes of groups. The transposition of the theory in quantitative metrics allowed us also to provide additional evidence to support another well-established theory about the maximum number of stable relationships for individuals in social environments (Dunbar’s number ). Furthermore, we consider spatial, temporal, and socio-topical metrics jointly for the first time, discovering some macro-classes of groups that reveal the interplay between the different dimensions; to mention two clear examples, topical groups that tend to be long-lived and with steady activity in contrast with social groups that are more often bursty and short-lived.
The main goal of this work is to provide yet another step towards a computational understanding of social structures, user-generated groups in this specific case. To show that the value a nuanced characterization of social aggregations is not limited to the possibility of carrying out more fine-grained network analysis, we also connect our study to the field of information diffusion showing that different types of groups can impact on the process of the spreading of information along the network. This is the first study that shows such empirical evidence and directly connects with very recent work in information diffusion that have been trying to leverage the same intuition .
The remainder of the paper is structured as follows. First, we present an overview of related work (Section 2). Then we introduce the three dimensions that we use to characterize social groups (geographical, temporal, and socio-topical) and we define how to measure them quantitatively (Section 3). After a short illustration of the Flickr dataset we use and of the ground truth we extracted to validate our socio-topical metrics (Section 4), we present the results of the application of the metrics, identifying different classes of Flickr groups with respect to the three dimensions considered in separation but also jointly (Section 5). Finally, we set our contribution into a wider context by analyzing the process of information diffusion in the light of the different group types in which the process takes place (Section 6).
2 Related work
2.1 Online groups characterization
Since the very early stages of the social web, the research community has been interested in the definition of the notion of group and of its possible types  not only for analytical purposes but also in direct application to several tasks, including profiling and recommendation , . The global structure, evolution and dynamics of social groups have been investigated over large-scale and heterogeneous datasets. The shape and evolution of groups have been described in computer science literature as very broad phenomena ,  that are determined by the intrinsic group fitness  and on the density of social links connecting their members .
Although the broad variety of group types and their emerging features (starting from their size ) has motivated some research work to characterize the nature of groups along their main dimensions, most of the contributions so far have not established any quantitative framework for their classification.
Due to its open nature and its multiple features, Flickr has been one of the most studied platform to this respect. Early work relied on interviews and user studies to identify the different usage of Flickr groups , finding five main motivations for users to join groups (memory, identity and narrative, relationships maintenance, self-representation and self-expression). Alternative classifications based on user studies have been proposed as well , .
Negoescu et al. have contributed quite much to this research area with several studies on Flickr groups. First they have introduced a manual categorization of Flickr groups, partitioning them in geographical, topical, visual, and “catch-all” groups . With this categorization in mind, they propose to detect hypergroups (i.e., groups of groups) based on the similarity of their topical focus, extracted with LDA ; in contrast, Negi et al. try to find subgroups in large Flickr communities using MoM-LDA on photo tags . Groups have been also studied in relation with their membership, with special attention to topicality and to recommendations exchanged between peers . In more recent work  Negoescu et al. have discussed about how to represent Flickr groups group according to the topics and tags in use by their members. Also, according to previous studies , they identified “real” groups as those motivated by self-expression and relationship maintenance. However, although every Flickr group can be mapped to a topic (set of words), not all groups have a topical focus, as we show in this work.
Following an earlier conceptual framework , Cox et al. attempted to measure the “groupness” of a group using several metrics as size of membership, volume of photos, length of description, and so on. They propose a classification of groups into topical (focused on a theme), highlighting (to promote photos to a wider public) and geographical (rooted into a specific geolocation); however their classification is ultimately arbitrary and not supported by quantitative results. In partial contrast with previous work , their results also point out that small groups are more important than the big ones to the social activity of the network as they operate at “human scale”. The work was subsequently extended  and the categorization was manually refined into four categories, namely location-based, award, learning, an topical groups.
Prieur et al. use Pricipal Components Analysis (a statistical procedure that converts a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables) on a set of features extracted from Flickr groups to detect the main dimensions that characterize them –. They find three main dimensions underlying as many types of groups: social media-use, MySpace-like, and photo stockpiling. The mixture of sociality and topicality of groups is also discussed, even though only tangentially.
Groups have been studied also in other online platforms. The structure of user interaction patterns in groups extracted from LiveJournal, DBLP, YouTube, Orkut, and Yahoo Groups have been investigated in the past , , , . Laine et al. present an analysis on YouTube groups, highlighting their tendency to both topicality and sociality and the small-world nature of the interactions inside them. Interestingly, they envision in future work an analysis of the interplay between groups and influence.
Last, some important contributions in the field of complex systems have investigated some properties of groups that can be inferred by the structure of social links via clustering or community detection algorithms. Barabasi et al. use a network of phone calls to find that large detected groups (using clique percolation) persist longer when they are capable of dynamically changing their membership, suggesting that the ability to self-altering the internal composition results in better adaptability. Using a similar dataset, Onnela et al. explore the geography of groups to find that small communities are geographically tight, but become geographically wider when the group size exceeds 30 members. Temporal patterns similar to the ones we explore in this work have been also investigated. Burstiness of human behaviour as a consequence of processing tasks in relation with their perceived priority have been studied, but not specifically in the context of groups .
2.2 Groups in (computational) social sciences
Small-size groups have been studied by researchers in psychology and sociology and by scholars in other social and behavioral sciences for the past century and especially in the past decades. The notions of community and social group have been widely debated in behavioral sciences , . The faceted complexity of groups, have been discussed for long time  and previous works have remarked that the internal dynamics of social groups emerge from the combination of complex cognitive processes such as sense of membership, influence between people, fulfillment of individual and collective needs inside the group, and shared emotional connections . Based on such widely accepted theoretical foundations, sociological theories have been formulated to disentangle all of these complex aspects.
From the perspective of organizational sciences, groups have been investigated with special focus on computer mediated communication, namely how belonging to a group can affect the communications between members, in time. Siegel et al., for instance, have run a small-scale comparison between online and offline groups, showing that the process of finding a consensus is resolved with a significantly bigger shift from the initial members’ opinion in the online case rather than the offline one . Also, other studies highlighted that members physical co-presence is an important factor for the success of a task-oriented group, as the geographical distribution of members could negatively impact the effectiveness of communication . Similar patterns are found also when considering the time dimension, as “ongoing” teams that have to collaborate for a long time must tackle more process and structural issues than groups with “temporary” tasks . In environments where many groups co-exist, overlap between with other groups’ memberships, age and size of the group can be factors that affects the group in terms of its growth, as result of competitive pressure . Also, individual benefit of group members in collaborative tasks can be greatly driven by the impulse given by the group leader to increase community building . The use of language within social communities has been studied by Postmes et al., who look at how interactional norms regulate the conversation style (e.g., use of abbreviations, superlatives) in small email discussion groups, shedding light on the processes of social construction and formation of social identity .
More recent work in computational social science attempted to characterize groups in relation to well-established theories from social sciences. The dependency of activity and connectivity on group size has been studied in several platforms , , , showing relations to Dunbar’s theory on the upper bound of around 150 stable social relationships for an average human . The dimension of similarity between members has been identified also as a factor driving the creation of social communities , particularly given that, to a large extent, users in social networks tend to aggregate following the homophily principle . However, similarity is not necessarily the strongest indicator for group activity and longevity, as diversity of content shared between group members is a major factor to keep alive the interest of members .
Social and thematic components of communities have been widely studied in social science, most of all within the common identity and common bond theory on which part of the present work is based –. Nevertheless, the principles behind the theory have never been translated into practical methods to categorize groups, nor tested on large datasets. On the other hand, data-driven studies have investigated social and thematic components separately when characterizing groups . Preliminary insights on the interweavement between such dimensions have been given in exploratory work on Flickr, where signals of correlation between social density and tag dispersion in groups is shown  and where two different clusters emerge naturally when plotting the groups size against the number of internal links . In this work, we define metrics that can be used to predict if a group is social or topical and testing their effectiveness against a reliable ground truth.
2.3 Automatic group extraction
Besides the analysis of user-created groups, the study of automatically detected groups through community detection algorithms has attracted much interest lately . Detected communities are supposed to represent meaningful aggregations of people where dense or intense social exchanges take place among members . Nevertheless, even if synthetic methods to verify the quality of clusters have been proposed , the question of whether such artificial groups capture some notion of community perceived by the users remains open. If on the one hand the computation of cluster-goodness metrics over user-created groups can give useful hints about their structural cohesion , on the other hand a direct comparison between user-created groups and detected communities is still missing, particularly in terms of the amount of sociality or topical coherence they embed.
2.4 Information propagation
Modeling the dynamics of information diffusion and influence along network links has received much attention in the last decade, especially in relation to the task of optimization of viral marketing strategies . A large corpus of studies on influence and information propagation has relied on Twitter-based experiments , . In Flickr, instead, an analysis of information propagation based on favorites showed that diffusion is limited to individuals who reside in the close neighborhood of the seed user and the spreading process is very slow . In this paper we use the same propagation model to measure the effect that the boundaries of different groups may have on diffusion of information. Instead of representing the influence as an infection phenomenon between connected individuals, alternative models agnostic on the network structure that rely only on the time of the contagion have been proposed , , assuming the presence of a hidden contagion web that might be different from the observed social network .
Even though our contribution does not focus on the definition of information propagation models, we measure information diffusion through social links within different group types, motivated by recent findings that hypothesize a connection between group type and potential of propagation of information cascades , .
3 Metrics for group characterization
Geography, time and the duality between social and topical bias of groups are mentioned multiple times in previous work, as they are important aspects for the characterization of communities. Next, we consider those three dimensions and define new general metrics for each of them. All our metrics assume the presence of a user base U and a set of groups G where , . Users can belong to multiple groups and we associate, with each group, a bag of user-generated terms (e.g., tags, group posts). We also assume to have a set of actions that members of a group g perform within the group (e.g., group subscription or photo upload in the group pool). In the following, we refer to these actions also as events. We consider space and time associated to those actions, respectively and , ; when focusing on a specific type of action, we will consider their temporal sequence, whose timestamps will be denoted simply as , . Last, we take into account also the social interactions between members of a group and within groups. We adopt a very general multidigraph model that fits most of the current social media platforms. Members are represented as nodes, and each distinct interaction between any two members is represented by a directed arc.
3.1 Geographical dispersion
This solution is linear with the number of points, it considers the Earth’s curvature and it considers the World as spherical, thus addressing the limitations of previous approaches.
3.2 Temporal patterns
Similarly to geography, groups could exhibit also quite broad temporal patterns. The time series of events associated to a group (e.g., photo uploads) is the temporal footprint we aim to characterize. Of course, as each distribution in time is likely to be unique, we need to capture the peculiar features of each temporal pattern. We rely on the statistical properties of the distribution of the volume of actions in time to describe the time sequences. We identify four different properties: the central tendency, the dispersion, the skewness and the burstiness. In the following, we consider that all the events take place in a fixed, large time window (that will correspond to our temporal sample in the experimental data). Next, we define their meaning and propose metrics to capture each of them. The way to combine the metrics for a characterization of groups along the temporal axis will be discussed in Section 5.2.
3.2.1 Central tendency
The output value is in the range and reflects the central tendency of distribution of events in time: the closer the value to 0 the most time values happened at the beginning of the group’s life, the closer to 1, the most values near to the present. Groups with strong central tendency will have values close to 0.5.
Values range from 0 to 1. Note that groups with high central tendency would have low dispersion, but groups with low dispersion could have also low central tendency. However, a non-corrected standard deviation would still be dependent by the central tendency, as for instance a series of time events with central tendency value of 0.1 cannot have a dispersion higher than 0.5. To ensure that the independence between metrics the correction value is required. For the sake of brevity, we do not report the mathematical details here, but a mathematical justification of the correction is reported in the Appendix.
Again, values are in the interval. A divergence between the mean and the median implies a skewed distribution as more elements will have values either smaller or larger than the median. The correction factor in the denominator ensures the independence between the skewness and the central tendency, as shown in the Appendix.
Note that the mean of all the inter-event times is equivalent to the total time between the and , divided by the number of events. The median of the inter-event times instead will get values on the range . For the series with uniformly separated events, and will be equal whereas the groups with a bursty behavior will have a near to 0.
3.3 Topical and social groups
The “common identity and common bond” theory  states that, depending on the prevalent motivation of people to join a group, groups can be categorized as either social or topical, and assumes that the two types of groups have distinct and well-recognizable traits. In recent years, the theory has been widely commented and elaborated by social scientists from a theoretical perspective and through small-scale experiments , , , but no rigorous methodology to distinguish the two types has been developed nor tested on large-scale datasets.
We design a technique to detect the group type based on the common identity and common bond theory, first to contribute to a strong validation of the theory itself but also to provide a general framework for automatic classification of user groups in online social media. In the following, we provide a more detailed description of the theory and then we propose a translation of its main principles into general metrics that can be applied to social graphs.
3.3.1 Identity and bond theory
The common identity and common bond theory describes social groups along the dimensions of topicality and sociality , . According to the theory, the attachment to a group, as well as the permanence and involvement in it, can be explained in terms of common identity or common bond. Identity-based attachment holds when people join a group based on their interest in the community as a whole or in a well-defined common theme shared by all of the members. People whose participation is due to identity-based attachment may not directly engage with anyone and might even participate anonymously. Conversely, bond-based attachment is driven by personal social relations with other specific members, and thus the main theme of the group may be disregarded. The two processes result in two different group types, that for simplicity we name “topical” for identity-based attachment and “social” for bond-based attachment.
In practice, groups can be formed from a mix of identity and bond-based attachment, but very often they tend to lean more towards either sociality or topicality. According to the theory, the group type is related with the reciprocity and the topics of discussion. Members of social groups tend to have reciprocal interactions with other members, whereas interactions in topical groups are generally not directly reciprocated. In addition, topics of discussion tend to vary drastically and cover multiple subjects in social groups, while in topical groups discussions tend to be related to the group theme and cover specific areas. According to the theory, social groups are founded on individual relationships between their members, therefore it is harder for newcomers to join and integrate with members that already have strong relationships between each other. One implication of this is that social groups are vulnerable to turnover, since the departure of a person’s friends may influence his own departure. Topical groups, on the other hand, are more open to newcomers and more robust to departures.
3.3.2 From theory to metrics
It is possible to construct metrics to differentiate between the two types of groups by quantifying the reciprocity of interactions, and the topicality of the information exchanged between group members. Next, we describe: (i) reciprocity metrics, used to quantifying group sociality, (ii) entropy of terms, to determine how much the topics of discussion vary within a group, and (iii) activity metrics, to measure the liveliness of the group. Similarly to the temporal dimension, the approach to combine all these metrics into a decision on the group type will be discussed in Section 5.3.3, with specific examples on our Flickr case-study.
where and are, respectively, the number of reciprocated and non-reciprocated links internal to the group g. Correspondingly, the inter-reciprocity at the border of the group is defined by , accounting for the reciprocity between members and non-members.
We add 1 to both numerator and denominator to reduce the fluctuations of at low values of . This relative reciprocity compares the reciprocity between the members with their reciprocity toward people not belonging to the group. It reflects how sociality of group members distinguishes itself from the environment.
where and are total numbers of interactions originated by members of the group g or being targeted to members of this group, where E is the total number of interactions in the network. If this property has a value higher than 1 then the number of interactions internal to the group is higher than the number of interactions expected in a random scenario with the same group activity volume.
where is the cardinality of group g and N is total number of nodes in the network. Values of greater than 1 indicate a density of internal interactions higher than interactions between the group and the rest of the network. This metric effectively compares intensity of interactions between members of the groups with the intensity of their interactions with the entire network.
4 Dataset and preprocessing
To test our metrics we use a dataset from Flickr. The wide variety of user groups, the richness of interaction types, and the openness of the data (retrievable through the public API) make Flickr an ideal platform for our study.
4.1 Flickr groups and interactions
Total number of interactions and declared/detected groups.
First, for all the users of the groups, we collect public information of their profile, extracting their interactions with other users or online objects, namely:
Comments. User u comments on a photo of user v. This interaction is mediated through the photo. We filter out the comments of users on their own photos, obtaining a total of 238M comments.
Favorites. User u marks one of user v’s photos as a favorite. The interaction is mediated through the favorited photo. We extract 112M favorite interactions.
Contacts. User u adds user v among his contacts. Social contacts in Flickr are directed and may be reciprocated. One person can choose another person as his contact only once and the relation remains in the same state until the contact is removed. There are 71M contacts in our dataset.
Additionally, we also rely on the information related to specific actions that users make to interact with the group itself:
Uploads. User u uploads a photo p to the group photo pool. Flickr groups provide pools to store pictures related to the group and pictures can stay in multiple pools. Only members of the group can upload a photo to a pool.
Subscriptions. User u joins a the group at a certain time.
In addition to user-created groups (we refer to them as declared), in Section 5.1 we analyze the sociality and topicality properties of groups that are not defined by users but are instead found by community detection algorithms (we name these detected groups). We applied the OSLOM community detection algorithm  over the entire network of social contacts in our dataset. We choose OSLOM because it detects overlapping communities, which is a natural feature of real groups. Moreover, OSLOM has performed well in recent community detection benchmarks  and it outperformed other algorithms we tested. OSLOM detected 646K groups.
We also use tags of the photos as terms for our model. The primary set of photos from which we extract tags is the photo pool. Photo pools are available for declared groups only. In addition, in both declared and detected groups, the interactions between members of the group that are mediated through photos (i.e., comments, favorites) result in two additional photo sets from which tags are extracted. We process the three tag sets separately (pool, comments, favorites), and for each of them we compute the normalized entropy (, , ).
4.2 Socio-topical group labeling
The socio-topical dimension we consider is a rather abstract concept and we like to check whether our metrics are able to correctly capture it. For this reason, we need a reliable ground truth to check against the detected sociality and topicality scores. We asked human coders to label groups based on well-defined guidelines extracted directly from the common identity and common bond theory . For the labeling we randomly selected groups meeting the following minimum requirements: (i) more than 5 members, (ii) more than 100 internal comments, (iii) relative activities and higher than 102. The third requirement ensured us that the selected groups were active well above the expected values in a random case. After this selection we obtained over 34K declared groups and over 33K detected groups. We describe the labeling process of such groups in detail next.
4.2.1 Information provided to labelers
The labeling is based on the human capability of processing the semantics, aesthetics, and sentiment behind text and photos. With the editorial process we generate a ground truth of “social” and “topical” groups. The coders were asked to make judgments in this respect and were presented with the following information for each group:
Group profile. The Flickr group profile consists of the group name, description by the creator of the group, discussion board, photo pool, and map of places where photos uploaded to the group pool were taken. This information is available only for declared groups.
Comments. We provide text of all comments that happen between the members. Comments are shown in chronological order and are grouped by thread, if they appear under the same photo. Additionally we also include a link to the photo.
Tags. Human coders are shown the list of the 5 most frequent tags attached to the photos that mediate the internal comments to the group. The list is sorted alphabetically.
4.2.2 Labeling guidelines
Coders were shown the information described above and asked to categorize groups as either social, topical or unknown. The last case is reserved for groups for which text is written in a language unknown to the labeler, making the task impossible to accomplish. Intentionally, no unsure category was allowed to keep the categorization strictly binary, as the theory does. Some groups can be both topical and social, and therefore difficult to categorize, but for the sake of clarity and conformity with the theory we kept the categorization as a binary task. Coders were provided with specific instructions on how to recognize social and topical groups, and on how to perform the categorization. The guidelines are summarized as follows:
I. Comments and photos. By examining comments and photos, find traces of people who know each other or who have a personal relationship. Knowing each other’s real names, spending time together, co-appearing in photos, sharing common past experiences, referencing mutually known places, and disclosing personal information are all signals of the presence of a social relationship . The predominance of friendly and colloquial comments (e.g., jokes, laughter) is another element distinguishing social groups from topical groups. In topical groups, the atmosphere is more formal and comments tend to be more impersonal . Examples of impersonal comments include expressing appreciation for photos, praising the photographers, thanking them for their work, or commenting on any particular topic in a neutral way. As a rule of thumb, if many personal comments are detected, then the sociality of the group should be considered high. If such comments are not many (e.g., just between small subsets of members), but the overall atmosphere of the interaction is rather personal and friendly, then we consider the sociality of this group as fairly present. If, on the other hand, comments are mainly impersonal and neutral, sociality has to be considered low, in favor of higher topicality.
II. Tags and description. Read the tags and the profile description of the group. If the tags are semantically consistent then the topicality of the group should be considered high, and even higher if the name and description of the group corresponds to the content of the tags. In some cases, tags or group descriptions can contain words indicating personal relations or events (e.g., “wedding”, “grandpa”, names, etc.), indicating a higher sociality of the group. Tags can also contain names of specific locations. Geo-characterized tags can be reasserted by looking at the map of places where photos were taken. Such tags are a good indication that the sociality of the group is present, but that has to be confirmed through the inspection of comments.
The coders labeled the groups after judging the two aspects above. If both tags and comments are highly social or topical, then the choice of label is straightforward. If the tags are highly topical and the comments are not social then the group is labeled as topical, and vice versa. If the tags are a bit topical and comments highly social then the group is labeled as social. The labelers were asked to read as many comments as needed to arrive to a fairly clear decision.
4.2.3 Group examples
To provide a sense of how the defined guidelines were applied in practice, we describe two examples. The first one is a group titled “Airlines Austrian”, tagged with labels “aircraft”, “airport” and “spotting.” Photos are from different countries in Europe and the vast majority of them depict airplanes. Members are very active in commenting and writing comments related on the aircraft theme (e.g., “I just love this airplane, the TU-154M is just a plane Boeing or Airbus could never design”). In this case, all of the features are aligned with the concept of topical group defined in the guidelines. The second group is named “Camp Baby 2008” and it is described in the main page as a collection of photos of a two-day event for young mothers taking place at a specific location. Photos depict people attending the event and interacting with each other with a friendly attitude. Tags and comments often contain names of individuals and references to past common experiences (e.g., “I love Mindy and cannot wait to see her again!!”). Although the group has a specific topic, its social component is very strong. In practice, more ambiguous cases can occur and, ultimately, the decision of the labeler has an arbitrary component, as in every complex annotation process. Nevertheless, the defined guidelines gave the labelers precise instructions and, as described next, we recurred to multiple independent coders to assess the quality of the extracted ground truth.
4.2.4 Labeling outcome
A total of 101 declared groups and 69 detected groups were labeled by 3 people: two of the authors and an independent labeler who was not aware of the type of study nor of the purpose of the labeling. The inter-labeler agreement, measured as Fleiss’ Kappa, is 0.60 for the declared groups, meaning that there exists good agreement between labelers.
In order to assess the quality of the labels, we also counted the number of messages exchanged between group members. The counting was done anonymously in aggregate and the content of the messages was not accessed. Groups labeled as social contain around twice as many messages between their members compared to topical groups of similar size. Even if this does not constitute a proof of higher sociality, intuitively people who get in touch via one-to-one communication are more likely to have a more intimate social relationship.
The Kappa value for detected groups is around 0.44, revealing lower agreement. A factor that partially determined such result is the lack of information about the group’s profile, since it is not available for detected groups. Another cause of the disagreement is a higher variability in the comments. This may be because we use a network of contacts for the purpose of finding clusters and defining detected groups, which may not be the best proxy of personal relations.
In total we label 565 distinct declared groups and 126 distinct detected groups. We characterize them in the following section.
5 Characterization of groups
We now describe the Flickr groups in our dataset according to the three dimensions identified above. After a short analysis of the overlap between declared and detected groups, we inspect each dimension separately, discussing how the metrics we identified earlier are applied to groups. Last, we discuss the characterization of groups along all the three dimensions.
5.1 Overlap of groups with detected communities
Since community detection techniques have been largely employed in recent years to describe the structure of complex social systems , the need for a clearer assessment of the meaning of the detected clusters has been often expressed from different angles , , but never completely satisfied. In this study we contribute to shed light on this matter by comparing the user-generated groups with the groups detected algorithmically (as described in Section 4).
The groups from the two sets share typical properties of groups found in on-line social networks. The distribution of sizes of groups in both cases is heavy-tailed and close to power-laws (not shown). Declared groups tend to be much bigger, having on average 61 members versus 7 members in detected groups.
This holds for groups of all sizes, as shown in Figure 4(e), in which we plot the 91th and 99th percentiles of the best match similarity for detected groups of various sizes (e.g., 1% of detected groups of size 20 have similarity with declared groups higher than 0.75, while for the randomized case 1% of the groups have similarity higher than just 0.05). Therefore, in some cases the community detection algorithm finds groups which are also defined by users (i.e., declared groups). We present evidences that this does not occur by chance through the comparison with the randomized case. Nevertheless, a substantial overlap is found for just a small percentage of groups. Most of the group pairs have similarity close to 0. Consequently, the similarity of detected groups to the best-matching declared groups is 0.082, while for the randomized detected groups it is not much lower, yielding 0.058.
5.2 Spatio-temporal classes
Spatial characterization of groups is defined by a single dispersion metric . In Flickr groups we have two potential different sources of geolocated data: user location and photo geotags.
Here we do not use the geolocations of users for two reasons. First, some users do not provide their position and the IP-based geolocation could be quite unreliable . Last, we aim to characterize groups with the information that is directly related to that group rather than to an individual. For this reason, we consider the geotags attached to the photos uploaded to the group instead.
To transition from a continuous value to a partition of groups into classes we apply the X-Means algorithm  over the monodimensional space of dispersion values, to avoid manual thresholding. X-Means is an improvement over K-Means where the number of clusters K is not given and it is able to estimate the number of clusters and the clusters in a much faster way than optimizing the parameter K with brute force approaches.
Not surprisingly, two clusters are found. The geo-narrow cluster, contains the 56% of groups, and the remaining 44% belongs to the geo-wide cluster.
The temporal aspect includes four different metrics that would be difficult to combine with ad-hoc approaches. Besides, we have two different sets of timestamped actions, namely user joining the group and photos uploaded in the goop pool. Therefore, similarly to the spatial clustering, we apply X-Means to this 8-dimensional feature space, obtaining three different clusters.
Average and standard deviation of every feature in each of the clusters.
0.48 ± 0.16
0.56 ± 0.14
−0.01 ± 0.32
2.26 ± 1.85
0.45 ± 0.14
0.58 ± 0.13
−0.03 ± 0.29
0.72 ± 1.32
0.03 ± 0.07
0.12 ± 0.17
0.19 ± 0.47
1.82 ± 1.88
0.05 ± 0.09
0.16 ± 0.18
0.16 ± 0.54
0.62 ± 1.13
0.23 ± 0.15
0.56 ± 0.19
0.43 ± 0.43
2.61 ± 1.92
0.15 ± 0.11
0.60 ± 0.19
0.73 ± 0.32
2.30 ± 1.98
Short-lived. The short-lived groups represent 13% of our sample and are characterized by low centrality and small dispersion. This category includes groups that had a little bit of activity after they were created and that became inactive shortly after. Examples include limited-scope photo sharing groups whose activity ceases shortly after the photos are uploaded and consumed by small social circles.
Evergreen. The evergreen cluster is the biggest one, containing 52% of the groups. Groups in this cluster are characterized by their high centrality and dispersion values around 0.5. were created at a certain point in the past and they have been growing in number of users and photos uniformly until the end of the time period we consider. Examples include groups dedicated to general topics, such as groups hosting artistic portraits from amateur and professional photographers.
Bursty. The remaining 34% of the groups are in the Bursty cluster, containing groups with lowest skewness and big burstiness, especially in the number of users joining. Those groups have usually the highest activity at the beginning of their life but then from time to time they experience photo uploads or user subscriptions in big batches. Some of these groups are related to recurring (e.g., yearly) events that attract attention of users regularly.
5.3 Socio-topical classes
To tackle the socio-topical dimension we first characterize the two sets of groups in terms of the metrics we introduced in Section 3.3.2. Then we study the relation between the labels of the declared groups annotated by the human coders and the values of the metrics. Additionally, we report ratios of groups labeled as social and topical among both declared, and detected groups.
5.3.1 Statistical properties of metrics
The values of relative activity both in declared and detected groups are very high, as presented in Figures 8(e), (f). As expected, activity of randomized groups exhibits values around 1 for all group sizes. For real groups instead, the value of relative activity decreases with the size of groups and gets close to 1 for very large ones. This may be caused by the fact that larger groups cannot be as integrated as smaller groups and the social commitment of their members towards other members of the group drops due to limited human capabilities. Additionally, we observe that the activity decay for declared groups occurs sharply between groups of size 100 and 200, in agreement with Dunbar’s theory on the upper bound of the number of stable relationships manageable by a human. The activity drop for detected groups is continuous and much more moderate (Figure 8(f)), since community detection algorithms tend by design to output node clusters with high numbers of connections between them.
5.3.2 Relation between metrics and group label
Here we analyze properties and values of the metrics for groups labeled through the editorial process. First, the ratio of groups labeled as social differs between declared and detected groups. In declared groups we find around 48% social groups, whereas among detected groups almost 69% are labeled as social. Additionally, we picked 50 detected groups among the ones that are the most similar to declared groups. Specifically, we selected them randomly from the 99th percentile shown in Figure 4. These groups have significant overlap with declared groups and should share similar properties. Indeed, the ratio of groups labeled as social among them is closer to that of declared groups and equal to 53%. We conclude that detected groups are more likely to be social than declared ones. It is a somewhat expected result, since clustering algorithms detect dense parts of a network, and so they are inclined to detect areas with more reciprocal connections. Note that the theory envisions more reciprocal relations in social groups. Thus, community detection algorithms are more likely to find social groups, however, determining to what extent it happens is not trivial.
One of the expectations is that bond-based groups should not be very large, as the human capacity for stable relationships is limited. As pointed in Section 5.3.1, the Dunbar number can be considered as a possible cap for the size of such groups, while topical groups do no yield such a restriction. In line with this expectation, we find that declared groups labeled as social have on average 35 members, whereas groups labeled as topical have on average around 172 members.
First, there are almost no differences in the number of photos (not shown), favorites, and contacts (as in Figures 10(b), (c)) inside social and topical groups. The number of comments is, however, around 2 times higher in social groups than in topical groups of similar size (Figure 10(a)). More differences can be found when looking at relative activity (Figures 10(d)-(i)), which compares the interaction internal to the group with the overall activity level of users belonging to groups. In all three types of interaction the relative activity metrics for social groups yield values from 2 to over 10 times higher than for topical groups. These metrics compare activity internal to the group with activity external to it. Therefore this result may reflect a stronger focus or even a possible isolation of members belonging to social groups from the rest of people they interact with.
More importantly, we observe large differences in values of reciprocity and relative reciprocity of comments and favorites. Social groups exhibit significantly higher reciprocity than topical groups (Figures 10(j)-(o)), in line with common identity and common bond theory. There is no difference in reciprocity of contacts, and a plausible interpretation is that contacts do not reflect personal relations between connected users. Possibly, since contacts do not need to be reciprocal, users often add people they do not know and do not interact with as contacts. Finally, we observe much higher values of entropy and normalized entropy in social groups than in topical ones (Figures 10(p), (q), (s), (t)). This holds for the tags extracted from photos commented, and favorited between members. Assuming that tags of photos represent topics of interaction, the result is consistent with bond attachment. It is expected for members of bond-based groups to cover many different topics and areas in their interactions, whereas members of identity-based groups focus their interactions on specific topics. However, this does not hold for the tags extracted from photo pool of the group (Figures 10(r), (u)). Apparently, the content of the photo pool does not always reflect well the interactions and relations between members of the group.
5.3.3 Group type detection
The properties of labeled social and topical groups tend to confirm the validity of the principles identified by the common identity and common bond theory. A stronger confirmation would directly come from the ability of the defined metrics to predict the tendency of a group towards sociality or topicality. To this end, we propose and compare two methods to predict the group type and we test their accuracy over the corpus of the labeled groups.
The first approach we use is a linear combination of the metrics. To this end, we select the features that are the most related to the sociological theory and for which we built specific metrics, i.e., , and . Each of them is applied to the 3 different interaction types and bags of tags, which produces a total of 9 values. We transform the values of the metrics into their t-statistics by subtracting the average value and dividing them by the standard deviation of the distribution. Then we weight the normalized scores evenly by dividing them by the total number of metrics considered and we finally sum them up to obtain a single score. All of the components are supposed to score high for social groups. Therefore, the higher the value of the score, the higher the chance that the group is social rather than topical. To convert the score into a binary label, a fixed threshold above which groups are predicted to be social must be selected. Using this approach, we aim at testing if those metrics, based on the theory, can be successful in predicting the type of group (social or topical).
The second approach relies on machine-learning methods that use the metrics’ values as features. Features are combined in a classifier that is first trained on a sample of labeled data to learn a prediction model. The trained classifier then outputs a binary prediction for any new group instance defined in the same feature space. Due to the limited size of our corpus of labeled groups, we estimate the classifier performance using 10-fold cross validation. We report results on a Rotation Forest classifier, which performed best in comparison to several algorithms implemented in WEKA. For the classifier we used a wider set of features than for the linear combination approach, namely: group size and , , , , , , , each applied to the 3 different interaction types and bags of tags. This results in a total of 22 features. We selected such a wide set of features to test if indeed the metrics proposed to distinguish between the social and topical groups are the best ones for the task. The relative predictive power of the features is measured through a feature selection algorithm.
Both methods, however, fail more frequently for groups with mixed social and topical features. The prediction accuracies of the classifier and of the score-based predictions have an evident drop of performance around 0 (Figure 12(b)). The accuracy at the extreme values of the score is close to 0.95, while it falls below 0.6 for groups with a score close to 0. On the other hand, this drop appears also in the agreement between two of the human labelers, measured as a ratio of groups that have been given the same label. Apparently, this is a shortcoming of the binary classification itself, as opposed to multi-label classification.
Group type prediction performance using (i) the score with threshold at 0, (ii) 10-fold cross validation on a Rotation Forest classifier trained on all the features, or (iii) the same classifier trained on the set of top-5 predictive features, according to the Chi Squared feature selection.
In addition, to determine the most predictive features, we rank the features using Chi-square feature selection. The top 5 features are, in decreasing order of importance: , , , , and . The selected set is the optimal for the prediction performance: retraining the classifier on such restricted set of features results in stable performance, as shown in Table 3. The top 4 most predictive features correspond directly to the expectations of the theory and results of the analysis from Section 5; in other words, the normalized entropy of comments on the photo within the group and the reciprocity of comments exchanged between members are the best predictors of the socio-topical divide of groups. More surprisingly, as not explicitly mentioned in the original theory, also the amount of activity, namely the normalized activity in commenting in our case (), is another good predictor. However, this is understandable, as we have already remarked on its importance and commented on its interpretation in Section 5.
5.4 Three-dimensional characterization
Once groups are characterized by each aspect separately, a natural question is whether there are some cross-dimensions relationships between group types, or in other words if different clusters of groups in one dimensions correspond predominantly to some other type of group in the other dimension. Blending all the metrics in a single model would be a way to answer the question. However, such unifying approach would be quite unpractical because of the different nature of the group characterization problem in different dimensions (clustering for geo-temporal, classification for socio-topical) and because of the difficult interpretation of a model that blends together such diverse types of measures.
Percentage of groups in each intersection between clusters. The sum of all the cells is 100%.
Some clear patterns emerge. First, social groups have a much higher ratio of bursty to evergreen groups than the topical ones. This is likely caused by the type of social behavior: a group of individuals who know each other would more likely join all the groups at its very beginning and probably would have a bursty activity caused by events of the social group. Symmetrically, topical groups tend to belong more to the “evergreen” category, as some topics are indeed not tied to the churn of social groups or to temporal trends. Furthermore, we can see a relation between short-lived and geo-narrow groups: groups that live for a short time have way less probability to spread on a big geographical scale, or in other words geo-width is a good indicator of an high chance of the group to survive longer.
6 Information diffusion in groups
Work in graph mining and social network analysis is too often conducted in several separate sub-branches focused on the solution of smaller tasks and with scarce contamination with other closely related pieces of research. One example is the relationship between communities and information diffusion. As cleverly noted just recently in a book by Easley and Kleinberg , the phenomenon of information diffusion, namely the flow of information along social links generating information cascades on a social network, is likely strongly coupled with the concept of community. In fact, the community boundaries should include people that are, to some extent, more similar to each other than to the rest of the network and the information likely would tend to spread inside that community and have lower penetration on the outside. In short: “cascades and clusters truly are natural opposites: clusters block the spread of cascades, and whenever a cascade comes to a stop, there’s a cluster that can be used to explain why” .
Very recently, this idea has been leveraged by Barbieri et al. who used data of information cascades to detect hidden communities. However, we argue that the process of spreading could be determined also by the type of communities involved in the process. Intuitively when a piece of information about a certain topic reaches a community that is interested in the same topic then the information will probably spread more easily, but what if a social (instead of topical) community is reached by the information cascade?
We contribute to shed light on this matter by running an experiment to check information cascades in relation to the types of groups we identify in this work. To do that, we rely on a well-established work by Cha et al.,  which modeled information propagation on Flickr. Here we replicate that model and study the resulting information cascade considering groups as additional component.
- (1)starts following ;
- (2)favorites a photo p;
- (3)favorites the same photo p.
This experimental framework is motivated by the fact that, in Flickr, users are notified about the photos that their followees favorite. The information diffusion links can be used to reconstruct potentially several information diffusion cascades (also called “diffusion trees”), where the root is a user who favorited a photo without having any followees who favorited it before.
user u joins group g;
photo p is uploaded to g;
u favorites p.
Of course, for each , pair there could be multiple root users, namely multiple members of the group who are not following each other and who all favorite the same photo according to the temporal sequence specified above. We connect all this root users to a common super-root identified by the pair. Once the root nodes are identified, we apply the framework by Cha et al., thus obtaining information cascades each labeled by a unique pair. Note that a photo could be uploaded in multiple group pools, thus originating more than one cascade. We consider each of these possible cascades separately.
The method we propose is limited by the fact that the root user might favorite a photo not because it has been published in a group but for any other reason (e.g., it was discovered by random browsing). However, we argue that if the photo has been uploaded to the pool we can assume it to be relevant to the group and the nature of the actual action that triggered the first favorite is not crucial to the study.
On the social-topical axis, the difference between different type of groups is slight but noticeable, with the topical groups having slightly more coverage and the social groups more external spreading (except for a small range of group sizes). This supports the intuitions of previous work that identifies the boundaries of topical groups as harder to be crossed by information cascades. This is somehow expected in the case in which members of topical groups share interests which are narrow enough to be limited predominantly to the groups members, while members of social groups do not necessarily share a specific common interest, therefore their favoriting behaviour is more varied and with higher chance to have an echo also outside the group. On the geographical dimension instead the difference is almost negligible, with slightly higher values for geo-wide groups for both metrics. This might be related to a better capacity of geo-wide groups to spread information in general.
More evident trends are obtained on the time dimension. On average, the evergreen groups have more coverage than the short-lived or the bursty ones, whereas the bursty groups are the ones with most external spreading. Evergreen groups are always active, so they get a lot of attention from their members, partially explaining why photos published in them get more coverage. On the other hand, bursty groups are often related to major events with broad scope whose photos can be of interest to a large audience in the Flickr community not restricted to the members only.
Providing nuanced descriptions of interaction atoms in networked social systems is crucial to get an accurate understanding of the online collective human behaviour. After social links, groups are the most important social structures around which the activity of social media revolves.
We contribute to explore this area by proposing a set of general metrics to capture the spatial, temporal, and socio-topical dimensions of groups, which are the three aspects about groups that have been informally identified in the previous literature but never formalized and studied in conjunction. Using a large Flickr group dataset for our experiments, we propose a new metric to account for geographical sparsity that identifies two main classes of spatially-characterized groups (geo-narrow and geo-wide); we cluster groups according to their temporal activity, being able to discover three major temporal patterns (evergreen, bursty, and short-lived groups); last we translate the “common identity and common bond theory” into metrics of reciprocity, activity, and topical diversity to distinguish social and topical groups. In particular, we annotate a number of Flickr groups as either topical or social and we match this ground truth with the machine-generated labels, showing that the socio-topical metrics, combined with a machine-learning approach, predict the group type with high accuracy. The analysis of the three dimensions in combination allows us to show interesting correlations between different classes. In particular, we find that groups that manage to spread on geographically-large scale are usually more long-lived than “local” groups, that topical groups tend to have a constant activity behaviour, being tolerant to the churn of their users, and that social groups have a bursty activity traces, with all the members joining at first and then interacting with each other from time to time, after relatively long periods of inactivity.
Besides these main results, our study is enriched by several pieces of complementary analysis. First, we find that the dependency of the socio-topical metrics on the group size confirms previous observations about the effective size of social communities (also known as Dunbar’s Number), peaking around rather small sizes and being limited by a cap of 100-200 members. Also, the comparison of the structure and sociality and topicality traits between declared groups and groups from community detection algorithms reveals that detected groups do not overlap much with declared groups on average, but they match sensibly more than the random case for groups of comparable sizes. Furthermore, detected groups are more often social than the declared ones. Last, inspired by previous work that puts in relation communities and information cascades and relying on a well-established model of information diffusion on Flickr, we study the dependency between group type and volume of information spreading inside or outside a group. We find that social and bursty groups allow the information to spread across the boundaries of groups more than topical and evergreen groups, that instead tend to retain more the information within them.
We hope that our study brings a constructive message in terms of (i) the need of more nuanced description of the structures in social networks and (ii) the benefits of putting in relation different collective phenomena that are rarely put in relation one with another.
8.1 A.1 Correction parameter for standard deviation
Therefore, being the maximum value for , we use it as normalization factor in (2).
8.2 A.2 Correction parameter for skewness
which we use it as normalization factor in (3).
This work is supported by the SocialSensor FP7 project, partially funded by the EC under contract number 287975.
- Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B: Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. IMC’07. ACM, San Diego; 2007:29–42. 10.1145/1298306.1298311View ArticleGoogle Scholar
- Negoescu RA, Gatica-Perez D: Analyzing Flickr groups. In Proceedings of the 2008 international conference on content-based image and video retrieval. CIVR’08. ACM, New York; 2008:417–426. 10.1145/1386352.1386406View ArticleGoogle Scholar
- Kairam SR, Wang DJ, Leskovec J: The life and death of online groups: predicting group growth and longevity. In Proceedings of the fifth ACM international conference on Web search and data mining. WSDM’12. ACM, New York; 2012:673–682. 10.1145/2124295.2124374View ArticleGoogle Scholar
- Aiello LM, Barrat A, Schifanella R, Cattuto C, Markines B, Menczer F: Friendship prediction and homophily in social media. ACM Trans Web 2012., 6(2): 10.1145/2180861.2180866Google Scholar
- Monge P, Contractor NS: Theories of communication networks. Oxford University Press, London; 2003.Google Scholar
- Aiello LM, Schifanella R, State B: Reading the source code of social ties. Conference on web science (WebSci’14) ACM, New York; 2014, 139–148. [10.1145/2615569.2615672]Google Scholar
- Barbieri N, Bonchi F, Manco G: Cascade-based community detection. In Proceedings of the sixth ACM international conference on Web search and data mining. WSDM’13. ACM, New York; 2013:33–42. 10.1145/2433396.2433403View ArticleGoogle Scholar
- Grabowicz PA, Aiello LM, Eguiluz VM, Jaimes A: Distinguishing topical and social groups based on common identity and bond theory. In Proceedings of the sixth ACM international conference on Web search and data mining. WSDM’13. ACM, New York; 2013:627–636. 10.1145/2433396.2433475View ArticleGoogle Scholar
- Dunbar RIM: The social brain hypothesis. Evol Anthropol 1998, 6: 178–190. 10.1002/(SICI)1520-6505(1998)6:5<178::AID-EVAN5>3.0.CO;2-8View ArticleGoogle Scholar
- Porter CE: A typology of virtual communities: a multi-disciplinary foundation for future research. J Comput-Mediat Commun 2004.Google Scholar
- De Choudhury M: Modeling and predicting group activity over time in online social media. In Proceedings of the 20th ACM conference on hypertext and hypermedia. HT’09. ACM, New York; 2009:349–350. 10.1145/1557914.1557983View ArticleGoogle Scholar
- Wang J, Zhao Z, Zhou J, Wang H, Cui B, Qi G: Recommending Flickr groups with social topic model. Inf Retr 2012, 15(3–4):278–295. 10.1007/s10791-012-9193-0View ArticleGoogle Scholar
- Cox A, Clough P, Siersdorfer S: Developing metrics to characterize Flickr groups. J Am Soc Inf Sci Technol 2011, 62: 493–506.Google Scholar
- Grabowicz PA, Eguíluz VM: Heterogeneity shapes groups growth in social online communities. Europhys Lett 2012., 97(2): 10.1209/0295-5075/97/28002Google Scholar
- Backstrom L, Huttenlocher D, Kleinberg J, Lan X: Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. KDD’06. ACM, New York; 2006:44. 10.1145/1150402.1150412View ArticleGoogle Scholar
- Baldassarri A, Barrat A, Capocci A, Halpin H, Lehner U, Ramasco J, Robu V, Taraborelli D: The Berners-Lee hypothesis: power laws and group structure in Flickr. In Social Web communities. Dagstuhl seminar proceedings. Edited by: Alani H, Staab S, Stumme G. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Germany, Dagstuhl; 2008.Google Scholar
- Van House NA: Flickr and public image-sharing: distant closeness and photo exhibition. In Extended abstracts on human factors in computing systems. CHI’07. ACM, New York; 2007:2717–2722. 10.1145/1240866.1241068View ArticleGoogle Scholar
- Miller AD, Edwards WK: Give and take: a study of consumer photo-sharing culture and practice. In Proceedings of the SIGCHI conference on human factors in computing systems. CHI’07. ACM, New York; 2007:347–356. 10.1145/1240624.1240682View ArticleGoogle Scholar
- Nov O, Naaman M, Ye C: Analysis of participation in an online photo-sharing community: a multidimensional perspective. J Am Soc Inf Sci Technol 2010, 61(3):555–566.Google Scholar
- Negoescu R-A, Adams B, Phung D, Venkatesh S, Gatica-Perez D: Flickr hypergroups. In Proceedings of the 17th ACM international conference on multimedia. MM’09. ACM, New York; 2009:813–816. 10.1145/1631272.1631421Google Scholar
- Negi S, Chaudhury S: Finding subgroups in a Flickr group. In Proceedings of the 2012 IEEE international conference on multimedia and expo. ICME’12. IEEE Computer Society, Washington; 2012:675–680. 10.1109/ICME.2012.114View ArticleGoogle Scholar
- Negoescu RA, Gatica-Perez D: Topickr: Flickr groups and users reloaded. In Proceedings of the 16th ACM international conference on multimedia. MM’08. ACM, New York; 2008:857–860. 10.1145/1459359.1459505View ArticleGoogle Scholar
- Negoescu R-A, Gatica-Perez D: Modeling Flickr communities through probabilistic topic-based analysis. IEEE Trans Multimed 2010, 12(5):399–416. 10.1109/TMM.2010.2050649View ArticleGoogle Scholar
- Butler B (1999) When a group is not a group: an empirical examination of metaphors for online social structure. PhD thesis, Carnegie Mellon UniversityGoogle Scholar
- Holmes P, Cox AM: Every group carries the flavour of the admins. Leadership on Flickr. Int J Web Based Communities 2011, 7(3):376–391. 10.1504/IJWBC.2011.041205View ArticleGoogle Scholar
- Prieur C, Pissard N, Beuscart J, Cardon D: Thematic and social indicators for Flickr groups. Proceedings of ICWSM 2008.Google Scholar
- Prieur C, Cardon D, Beuscart J-S, Pissard N, Pons P (2008) The strength of weak cooperation: a case study on Flickr., [arXiv:0802.2317]Google Scholar
- Pissard N, Prieur C: Thematic vs. social networks in Web 2.0 communities: a case study on Flickr groups. Algotel conference 2007.Google Scholar
- Backstrom L, Kumar R, Marlow C, Novak J, Tomkins A: Preferential behavior in online groups. In Proceedings of the 2008 international conference on Web search and data mining. WSDM’08. ACM, Palo Alto; 2008:117–128.Google Scholar
- Welser HT, Gleave E, Fisher D, Smith M: Visualizing the signatures of social roles in online discussion groups. J Soc Struct 2007., 8:Google Scholar
- Gloor PA, Zhao Y: Analyzing actors and their discussion topics by semantic social network analysis. In Proceedings of the conference on information visualization. IV’06. IEEE Computer Society, Washington; 2006:130–135.Google Scholar
- Spertus E, Sahami M, Buyukkokten O: Evaluating similarity measures: a large-scale study in the Orkut social network. In Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining. KDD’05. ACM, New York; 2005:678–684.Google Scholar
- Backstrom L, Huttenlocher D, Kleinberg J, Lan X: Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. KDD’06. ACM, New York; 2006:44–54. 10.1145/1150402.1150412View ArticleGoogle Scholar
- Laine MSS, Ercal G, Luo B: User groups in social networks: an experimental study on Youtube. 2011 44th Hawaii international conference on system sciences (HICSS) 2011, 1–10. 10.1109/HICSS.2011.472View ArticleGoogle Scholar
- Palla G, Barabási A-L, Vicsek T: Quantifying social group evolution. Nature 2007, 446: 664–667. 10.1038/nature05670View ArticleGoogle Scholar
- Onnela J-P, Arbesman S, González MC, Barabási A-L, Christakis NA: Geographic constraints on social network groups. PLoS ONE 2011., 6(4): 10.1371/journal.pone.0016939Google Scholar
- Barabási A-L: The origin of bursts and heavy tails in human dynamics. Nature 2005, 435: 207–211. 10.1038/nature03459View ArticleGoogle Scholar
- Riger S, Lavrakas PJ: Community ties: patterns of attachment and social interaction in urban neighborhoods. Am J Community Psychol 1981, 9: 55–66. 10.1007/BF00896360View ArticleGoogle Scholar
- Tajfel H: Social identity and intergroup relations. Cambridge University Press, Cambridge; 1982.Google Scholar
- McGrath JE, Arrow H, Berdahl JL: The study of groups: past, present, and future. Personal Soc Psychol Rev 2000, 4(1):95–105. 10.1207/S15327957PSPR0401_8View ArticleGoogle Scholar
- McMillan DW, Chavis DM: Sense of community: a definition and theory. J Community Psychol 1986, 14(1):6–23. 10.1002/1520-6629(198601)14:1<6::AID-JCOP2290140103>3.0.CO;2-IView ArticleGoogle Scholar
- Siegel J, Dubrovsky V, Kiesler S, McGuire TW: Group processes in computer-mediated communication. Organ Behav Hum Decis Process 1986, 37(2):157–187. 10.1016/0749-5978(86)90050-6View ArticleGoogle Scholar
- Walther JB: Group and interpersonal effects in international computer-mediated collaboration. Hum Commun Res 1997, 23(3):342–369. 10.1111/j.1468-2958.1997.tb00400.xMathSciNetView ArticleGoogle Scholar
- Saunders CS, Ahuja MK: Are all distributed teams the same? Differentiating between temporary and ongoing distributed teams. Small Group Res 2006, 37(6):662–700. 10.1177/1046496406294323View ArticleGoogle Scholar
- Wang X, Butler BS, Ren Y: The impact of membership overlap on growth: an ecological competition view of online groups. Organ Sci 2013, 24(2):414–431. 10.1287/orsc.1120.0756View ArticleGoogle Scholar
- Butler B, Sproull L, Kiesler S, Kraut R: Community effort in online groups: who does the work and why? Leadership at a distance 2008.Google Scholar
- Postmes T, Spears R, Lea M: The formation of group norms in computer-mediated communication. Hum Commun Res 2000, 26(3):341–371. 10.1111/j.1468-2958.2000.tb00761.xView ArticleGoogle Scholar
- Grabowicz PA, Ramasco JJ, Moro E, Pujol JM, Eguiluz VM: Social features of online networks: the strength of intermediary ties in online social media. PLoS ONE 2012., 7(1): 10.1371/journal.pone.0029358Google Scholar
- Goncalves B, Perra N, Vespignani A: Modeling users’ activity on Twitter networks: validation of Dunbar’s number. PLoS ONE 2011., 6(8): 10.1371/journal.pone.0022656Google Scholar
- Tang L, Wang X, Liu H: Group profiling for understanding social structures. ACM Trans Intell Syst Technol 2011., 3(1): 10.1145/2036264.2036279Google Scholar
- Ludford PJ, Cosley D, Frankowski D, Terveen L: Think different: increasing online community participation using uniqueness and group dissimilarity. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York; 2004:631–638.Google Scholar
- Prentice DA, Miller DT, Lightdale JR: Asymmetries in attachments to groups and to their members: distinguishing between common-identity and common-bond groups. Pers Soc Psychol Bull 1994, 20(5):484–493. 10.1177/0146167294205005View ArticleGoogle Scholar
- Sassenberg K: Common bond and common identity groups on the Internet: attachment and normative behavior in on-topic and off-topic chats. Group Dyn 2002, 6(1):27–37. 10.1037/1089-26184.108.40.206View ArticleGoogle Scholar
- Ren Y, Kraut R, Kiesler S: Applying common identity and bond theory to design of online communities. Organ Stud 2007, 28(3):377–408. 10.1177/0170840607076007View ArticleGoogle Scholar
- Fortunato S: Community detection in graphs. Phys Rep 2010, 486(3–5):75–174. 10.1016/j.physrep.2009.11.002MathSciNetView ArticleGoogle Scholar
- Lancichinetti A, Fortunato S, Radicchi F: Benchmark graphs for testing community detection algorithms. Phys Rev E 2008., 78: 10.1103/PhysRevE.78.046110Google Scholar
- Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth., [arXiv:1205.6233]Google Scholar
- Kempe D, Kleinberg J, Tardos E: Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. KDD’03. ACM, New York; 2003:137–146.Google Scholar
- Ye S, Wu SF: Measuring message propagation and social influence on twitter.com. In Proceedings of the second international conference on social informatics. SocInfo’10. Springer, Berlin; 2010:216–231.Google Scholar
- Cha M, Haddadi H, Benevenuto F, Gummadi KP: Measuring user influence in Twitter: the million follower fallacy. 4th international AAAI conference on Weblogs and social media (ICWSM) 2010.Google Scholar
- Cha M, Mislove A, Gummadi KP: A measurement-driven analysis of information propagation in the Flickr social network. In Proceedings of the 18th international conference on World Wide Web. WWW’09. ACM, Madrid; 2009:721–730. 10.1145/1526709.1526806View ArticleGoogle Scholar
- Yang J, Leskovec J: Modeling information diffusion in implicit networks. In Proceedings of the 2010 IEEE international conference on data mining. ICDM’10. IEEE Computer Society, Washington; 2010:599–608. 10.1109/ICDM.2010.22View ArticleGoogle Scholar
- Au Yeung C-m, Iwata T: Capturing implicit user influence in online social sharing. In Proceedings of the 21st ACM conference on hypertext and hypermedia. HT’10. ACM, New York; 2010:245–254. 10.1145/1810617.1810662View ArticleGoogle Scholar
- Gomez Rodriguez M, Leskovec J, Krause A: Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. KDD’10. ACM, New York; 2010:1019–1028. 10.1145/1835804.1835933View ArticleGoogle Scholar
- Barbieri N, Bonchi F, Manco G: Influence-based network-oblivious community detection. 2013 IEEE 13th international conference on data mining (ICDM) 2013, 955–960. 10.1109/ICDM.2013.164View ArticleGoogle Scholar
- Zwol RV: Flickr: who is looking? In IEEE/WIC/ACM international conference on Web intelligence. WI’07. IEEE Computer Society, Washington; 2007:184–190. 10.1109/WI.2007.22View ArticleGoogle Scholar
- Utz S, Sassenberg K: Distributive justice in common-bond and common-identity groups. Group Process Intergroup Relat 2002, 5(2):151–162. 10.1177/1368430202005002542View ArticleGoogle Scholar
- Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S: Finding statistically significant communities in networks. PLoS ONE 2011., 6(4): 10.1371/journal.pone.0018961Google Scholar
- Collins NL, Miller LC: Self-disclosure and liking: a meta-analytic review. Psychol Bull 1994, 166(3):457–475. 10.1037/0033-2909.116.3.457View ArticleGoogle Scholar
- Pelleg D, Moore AW: X -means: extending K -means with efficient estimation of the number of clusters. In Proceedings of the seventeenth international conference on machine learning. ICML’00. Morgan Kaufmann, San Francisco; 2000:727–734.Google Scholar
- Easley D, Kleinberg J: Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, New York; 2010.View ArticleGoogle Scholar
- Cha M, Mislove A, Adams B, Gummadi KP: Characterizing social cascades in Flickr. In Proceedings of the first workshop on online social networks. WOSP’08. ACM, Seattle; 2008:13–18. 10.1145/1397735.1397739View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd.Open Access This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.