A new methodology to measure faultlines at scale leveraging digital traces

Mehrjoo, Amir; Cuevas, Rubén; Cuevas, Ángel

doi:10.1140/epjds/s13688-022-00350-w

Regular article
Open access
Published: 07 July 2022

A new methodology to measure faultlines at scale leveraging digital traces

EPJ Data Science volume 11, Article number: 40 (2022) Cite this article

2620 Accesses
1 Citations
2 Altmetric
Metrics details

Abstract

The definition of society is tight with human group-level behavior. Group faultlines defined as hypothetical lines splitting groups into homogeneous subgroups based on members’ attributes have been proposed as a theoretical method to identify conflicts within groups. For instance, crusades and women’s rights protests are the consequences of strong faultlines in societies with diverse cultures.

Measuring the presence and strength of faultlines represents an important challenge. Existing literature resorts in questionnaires as traditional tool to find group-level behavioral attributes and thus identify faultlines. However, questionnaire data usually come with limitations and biases, especially for large-scale human group-level research. On top of that, questionnaires limit faultline research due to the possibility of dishonest answers, unconscientious responses, and differences in understanding and interpretation.

In this paper, we propose a new methodology for measuring faultlines in large-scale groups, which leverages data readily available from online social networks’ marketing platforms. Our methodology overcomes the limitations of traditional methods to measure group-level attributes and group faultlines at scale.

To prove the applicability of our methodology, we analyzed the faultlines between people living in Spain, grouped by geographical regions. We collected data on 67,270 interest topics from Facebook users living in Spain, France, Germany, Greece, Italy, Portugal, and the United Kingdom. We computed existing metrics to measure faultlines’ distance and strenght using our data to identify potential faultlines existing among Spanish regions. The results reveal that the strongest faultlines in Spain belong to Spanish Islands (the Canary Islands and the Balearic Islands), Catalonia, and Basque regions. These findings are aligned with the historical secessionist movements and cultural diversity reports supporting the validity of our methodology.

1 Introduction

“Conflict is the beginning of consciousness” – M. Esther Harding. A short period after settling the flames of World War II, many social scientists started thinking about how to explain the psychological forces that culminated in the Holocaust, among other horrors. During the post-war period leading into the 1970s, a branch of social scientists focused on group and group-formation procedures to find an interpretation for conflicts related to collective human behavior. In this context, ‘group’ was a label for aggregated interpersonal processes. Measurement techniques for the group-level behavior lack consistent findings when considering single group members’ attribute such as race [1–6]. Therefore, researchers were motivated to investigate the impact of multiple group member attributes alignment (e.g. race and gender) on team members conflicts.

Faultlines are hypothetical dividing lines splitting a team into one or more relatively homogeneous subgroups [7, 8]. Studies on the effects of faultline dynamics to explain theoretical underpinnings and effects of faultlines appear in sociology literature [9, 10]. Like many other aspects of human behavior, the implementation of measurement tools has been challenging. Still, reliable measurement techniques associated with group-level attributes have been introduced by the literature [11, 12].

Group faultlines usually have a detrimental effect on team-level outcomes [8, 10]. Lau and Murnighan (1998) introduced the initial faultline theoretical model [7, 13]. They based the theoretical reasoning on social categorization and social identity approaches [14]. Despite a well-developed theoretical framework, limited measurement techniques currently exist to create a strong link between these theories and the real world. Managers and politicians have considered faultlines measurements an essential tool for managing performance and leadership. Thanks to technological developments during the past few decades, many aspects of human social behavior are now more apparent to scientists. One of the main contributions of technology to human life is the onset and spread of social network platforms. These platforms offer free services to users in exchange of access to users’ data; they enrich their databases by the behavioral attributes of their users and manipulate them for marketing purposes. For example, Facebook provides a marketing platform for advertisers to target their audiences based on demographic, behavioral characteristics and location. These platforms’ new social behavior measurement instruments have more valuable benefits than traditional ones (surveys and questionnaires).

The traditional approach to measure faultlines was the application of questionnaires by asking team members about their behavioral attributes and calculating the metrics. This approach exposes the results to biases such as dishonest or unconscientious responses. Besides, scaling the research to larger groups using this approach is costly and time-consuming. In this research, we employ data from social networks’ marketing platforms and introduce a new approach to overcome these limitations. This new approach aims to increase the scalability and accuracy of faultlines measurement while making it less expensive. We introduce a reliable methodology based on data from billions of social network users to measure the faultline separating populations in different geographic regions. To prove the applicability of this tool, we analyzed the faultlines between people living in Spain, grouped by geographical regions. Spain has experienced identity-related regionalism independence movements and conflicts. If our methodology performs well, it should be able to capture these conflicts.

2 Theoretical discussion

The regional/national identity salience in geographic regions produces the previously mentioned conflicts. Political leaders tend to promote the differences such as cultures and national identities for getting more votes by drawing a clear line (faultline) between them and us (e.g., Catalans vs. non-Catalans in Spain). Social science literature is rich in theories and measurement techniques to analyze faultlines. We extend the available measurement techniques to understand better the faultlines’ status of large-scale groups. This research proves how our new proposed technique measures well faultlines between groups living in different Spanish Autonomous Communities^{Footnote 1} (referred to as CCAAs by their abbreviation in Spanish).

We first apply self-categorization and social identity theories to identify the places where we expect to find strong faultlines. Then, we use one of the most popular online social network platforms (Facebook) to measure the faultline’s distance and strength.

2.1 Faultline theories

The term faultline originates from geography and refers to the intersection of two tectonic plates. Therefore, faultlines mark locations that are more prone to split. Lau and Murnighan (1998) adopted this definition for research in group conflicts by defining faultlines as “hypothetical dividing lines that may split a group into subgroups based on one or more attributes” [7]. The purpose behind measuring faultlines is to quantify how a team is prone to split into subgroups [15]. According to the faultline theories, the groups divided into two homogeneous subgroups with distant intra-group attributes are more likely to conflict between members. Three main categories of faultlines have been the focus of the articles in this literature (1) Separation-based faultlines (e.g., followers of different football teams) (2) Information-based faultlines (e.g., engineers vs. psychologists) (3) Resource-based faultlines (group members’ access to “finite resources, e.g., power, materials, authority, and status”) [16]. Social identity and self-categorization are two of the most prevailing theories in this field. They are building blocks for faultline research as they explain: (1) sub-group formation, (2) relationship between group identity and trust, and (3) the nature of ingroup-outgroup biases [17].

2.2 Self-categorization and social identity

Social categorization theory justifies faultlines in human groups, and the comparative-fit is one of the several factors affecting social categorization processes. Comparative-fit explains how observed similarities and differences, such as languages or accents, are perceived as social categories [14]. A strong faultline makes the differences within groups more salient. The human’s brain ability to process information is limited. For example, if we see an object is flying and singing, we unconsciously assume it is a bird. Then we assign all the bird category attributes such as breeding by laying eggs and having wings to that object. Therefore, abstraction is the key attribute of the human brain to understand the surrounding world. The legitimate model of the human brain is the highest level of abstraction for demonstrating cognitive mechanisms [18].

The human capacity to recognize different levels of abstraction is limited. Cognitive procedures such as abstraction, thinking, and learning structure the information we retrieve from the world outside. When individuals confront disorganized and unlabelled data, they abstract the complex data into basic concepts with specific goals [18]. If the flying object has one wing instead of two in the bird example, the human’s mind still puts it in the birds’ category. The same happens when we see someone speaking a language (Italian, Chinese, Catalan, etc.), then we unconsciously assign attributes to that person (e.g., the origin country is Italy, China, Catalonia, etc.). The social identity approach describes the state of people thinking of themselves and others as a group. This theory states the three steps of psychological processes to perceive the social group is: (1) social categorization: organize social information by categorizing people into groups such as Catalan, Castilian, South American, and Japanese. (2) social comparison: give meaning to those categories to understand the group’s task in the specific situation (e.g., Catalans speak Catalan, Japanese are hardworking). (3) social identification: the process in which people relate themselves to one of those categories (e.g., I am Catalan!, I am Spanish!).

The lowest level of abstraction is given as a personal self during this process, where the perceiver categorizes themself as “I”. A higher level of abstraction corresponds to a social self, where the perceiver categorizes themself as part of a “we” compared to a salient out-group (them) [19]. Social identity theory explains some behavioral attributes of group members. According to this theory, people maintain their self-esteem by a cognitive bias assigning positive attributes to their group, nationality, category, etc. Individuals are assumed to be intrinsically motivated to achieve positive distinctiveness and “strive for a positive self-concept” [20]. This cognitive bias may also result in uneven distribution of resources and discrimination within groups. Therefore, members endorse resource distributions that would maximize the positive distinctiveness of an in-group in contrast to an out-group at the expense of personal self-interest [14]. Self-identity theory also explains that an in-group seeks to increase self-esteem by direct competition against the out-group. This effect would cause polarization of the group at a high level of social competition and make two salient subgroups. According to the similarity attraction paradigm [21], members in one subgroup experience psychological distance from other subgroup members and are less likely to cooperate [22]. Therefore, people living in the same country feel they are in the same group, thus, they have less distant behavioral attributes than the people in other countries (other groups) (Proposition 1). The self-categorization theory argues that a category’s prototype is contingent on the context in which the category is encountered. This theory is consistent with leader categorization theory, whereby stereotypical leaders were more effective than non-stereotypical leaders [23].

2.3 Insular effects

Islands have developed isolated living communities, whether plant, animal, or human, separated from, and differing to varying degrees from, mainland communities of the same kind. Means of physical communications, such as transport, were crucial for the past interaction of island and continental populations. They were also largely dependent on distance from the mainland, the climate, and technology. Contacts are influential in determining the degree, and the nature of cultural factors [24]. This is especially true in islands, which have been less affected by the cultural and ethnic change, hostile invasion, mass immigration, or political interference, and at the same time have been more exposed, if not open, to cultural stimuli from a wider variety of sources [25]. The distance and insularity of these islands result in more differential cultural attributes in the population. The differential cultural attributes may grew a strong regional identity and made it prevailing compared to the countries’ national identity. According to the faultline theories, the inhabitants of islands should consider themselves a distinct group that will lead to a strong faultline. Therefore, the faultlines in the islands are expected to be relatively strong (Proposition 2).

2.4 Conflict

Consensus exists in the faultline theories literature that a strong, activated faultline in a group of people can explain some social conflicts (task conflict, relationship conflict, process conflict) [26]. Faultline activation is defined as the process by which members of a group are perceived as members of one or more subgroups [27]. A vast body of literature is devoted to developing theories and techniques for measuring and managing conflicts (e.g., international joint ventures [28, 29], bi-cultural family kids [30]). The existing literature considered a strong faultline an important predictor of group conflicts. Thatcher and Patel (2011) argued that if a group perceives other sub-groups as threatening, individuals maintain their self-esteem by positive distinctiveness, resulting in a conflict between subgroups [8]. On the other hand, group diversity decreases conflict and group faultline strength [31].

Therefore, based on the literature outcomes, our methodology should observe higher faultlines values in regions that experienced some regional conflicts in their history (Proposition 3).

2.5 Measurement

The literature in the human group-level measurements mainly relies on questionnaire surveys. The application of questionnaires in the analysis comes with limitations, such as the number of questions and non-scalability. Large-scale surveys and collecting empirical data on the population have been costly, time-consuming, and in many instances impossible during human history [32]. On the other side, advertisers can elicit many behavioral dimensions by tracking internet users’ online behaviors [33]. Such platforms continuously track users’ interests, beliefs, preferences, behaviors, locations, and interactions. The majority of faultline research has been conducted by questionnaire survey-based experiments using relatively small groups. This paper is the first attempt at using large-scale field data provided by online social platforms in faultline research. We use one of the most prominent social networking platforms’ data (Facebook), with more than 2.9B monthly active users, to measure the faultlines. Facebook places particular importance on classifying the interests of its users for marketing purposes [34] and measures all the individual user’s preferences.

2.5.1 Interests in Facebook

Facebook infers user preferences from self-reported interests, clicking behaviors on Facebook posts, software downloads, GPS location, and processing the communications with other users in multiple platforms (e.g., Facebook, Instagram or Whatsapp). Facebook makes this information anonymized and accessible to marketers through an application programming interface (API). Facebook finds users’ interests by tracking their activities on Facebook’s platforms (i.e., Facebook and Instagram) and third-party websites, apps, and online services. To be more specific, in addition to the information collected from its owned social networks and applications (Facebook, Instagram, and Whatsapp), Facebook collects data from more than 30% of the most popular websites [35].

Facebook may also track users’ locations through their mobile devices, inferring the amount of time each user spends in locations such as football fields, universities, theaters, restaurants, and churches. Facebook users’ interests are shaped by multiple facets of their activity (e.g., if someone goes to the football stadium for all Real Madrid football team matches, after checking out Real Madrid online website, Facebook most probably assigns “Real Madrid football team” to the user interests). Thus, countless interests shape human preferences in Facebook. Facebook organizes interests in a multi-level, hierarchical structure with 14 root categories: business and industry, education, family and relationships, fitness and wellness, food and drink, hobbies and activities, lifestyle and culture, news and entertainment, people, shopping and fashion, sports and outdoors, technology, travel places, and events. Facebook also assigns unique, language-independent id to each interest.

Facebook finds user interests through multiple information channels, including page likes, self-declared interests, downloaded apps, and location. This approach forms the most comprehensive dictionary of preferences for billions of people. Previous studies found the following paths to assign preferences to each user. The user has this preference because: (i) “This is a preference the user added,” (ii) “what the user does on Facebook, such as pages the user has liked or ads the user clicked,” (iii) “the user clicked on an ad related to…”, (iv) “the user installed the app…”, (v) “the user liked a page related to…”, (vi) “the user comments, posts, shares or reactions the user made related to…”. [36] The goal here is to measure faultlines (strength and distance) using the features extracted from different the popularity of different topics among groups of people living in specific geographic regions. The following section explains how we extracted these features from Facebook data. In previous work, we presented the first large-scale analysis of measuring culture using tens of thousands of interests to define human group culture and examined the validity of this approach using the world values survey (WVS), among other sources. Our findings showed that the Facebook measurement encompasses a broader range of cultural explanatory dimensions than the WVS [37].

2.5.2 Faultline distance

According to distance theory, team members in one subgroup feel psychological distance from team members in other subgroups, making them less likely to cooperate [22]. Thus, measuring the distance between the behavioral attributes of the subgroups will shed light on the status of the faultline. Faultline distance reflects the extent to which formed subgroups differ from one another in terms of behavioral characteristics [38]. The distance between the group-level attributes of two subgroups is used to calculate faultline distance. Consider group G consists of n members $A_{j}$ ($j= 1$ to n).

$$ G= \{A_{1}, A_{2} , \ldots, A_{n} \}. $$

Each member of the group may be interested in topic i ($a^{i}= 1$) or may not be interested in that topic ($a^{i}=0$). Then we can assign a vector of p dimensions (attributes) to each member (e.g. member j).

$$ \overrightarrow{A}_{j}= \bigl\langle a_{j}^{1}, a_{j}^{2} , \ldots, a_{j}^{p} \bigr\rangle . $$

We compute group-level attributes ($\overrightarrow{V}_{g}$) using mean vectors (average value of group members for each attribute). The pth group level attribute ($\overline{a^{p}}$) is calculated by averaging pth attribute ($\overline{a^{p}_{j}}$) across all group members (n).

$$\begin{aligned}& \overline{a^{p}}=\frac{\sum_{j = 1}^{n}a_{j}^{p}}{n}, \\& \overrightarrow{V}_{g} = \bigl\langle \overline{a^{1}}, \overline{a^{2}},\ldots,\overline{a^{p}} \bigr\rangle . \end{aligned}$$

Faultlines by definition are hypothetical lines splitting group (V) into subroups $(v_{1},v_{2})$. We assign a vector of p dimensions to each subgroup $(\overrightarrow{v_{1}},\overrightarrow{v_{2}})$:

$$\begin{aligned}& \overrightarrow{v_{1}} = \bigl\langle \overline{a_{1}^{1}}, \overline{a_{1}^{2}},\ldots,\overline{a_{1}^{p}} \bigr\rangle , \\& \overrightarrow{v_{2}} = \bigl\langle \overline{a_{2}^{1}}, \overline{a_{2}^{2}},\ldots,\overline{a_{2}^{p}} \bigr\rangle . \end{aligned}$$

The faultline distance ($D_{g}$) is the Eclucidian distance between two subgroup attribute vectors ($v_{1}$, $v_{2}$):

$$ D_{g} = \vert \overrightarrow{v_{1}}- \overrightarrow{v_{2}} \vert = \sqrt{\sum _{i = 1}^{p} \bigl(\overline{a^{i}_{1}}- \overline{a^{i}_{2}} \bigr)^{2} } . $$

2.5.3 Faultline strength

Thatcher and Patel (2003) described faultlines as potential splits that yield “relatively homogeneous subgroups based on the attributes of the team members.” [39]. Faultlines, as the definition implies, are imaginary lines that separate homogeneous groups, and faultline strength measures how homogeneous these subgroups are. As a result, to calculate faultline strength, referred to as Fau, we compute the variations within each group. This measurement is based on self-categorization theory, which distinguishes between in-group and out-group, which explains why the measure can detect only two subgroups.

In theory, polarization is one outcome of group conflict, making within-group differences more salient [40]. Therefore, faultline strength is a valid measurement for groups with strong faultlines. They illustrated the differences between faultline and distance measurement using a comparison table. The Table 1 shows two groups of four people with different demographics. In the first group, there are two distinct subgroups with demographic characteristics that are homogeneous within subgroups. Members of the second group, on the other hand, have a wide range of demographic characteristics. These two groups have the same faultline distance measurement. However, due to the demographic attributes alignment of the subgroup members, the faultline strength measurement of the first group is higher. Thatcher formulated the faultline strength based on information on p attributes of each group member as follows:

$$ \mathit{Fau}_{g} = \frac{\sum_{i=1}^{p}\sum_{j=1}^{2}n_{j}^{g} ( \overline{a^{i}_{j}}-\overline{a^{i}} )^{2} }{\sum_{i=1}^{p}\sum_{j=1}^{2}\sum_{k=1}^{n^{g}_{j}} (a^{i}_{jk}-\overline{a^{i}} )^{2}} ,\quad g=1,2,\ldots. $$

Table 1 Example of Subgroup Distance/Fau Strength Analysis (Adopted from Thatcher 2003 [39])

A new methodology to measure faultlines at scale leveraging digital traces

Abstract

1 Introduction

2 Theoretical discussion

2.1 Faultline theories

2.2 Self-categorization and social identity

2.3 Insular effects

2.4 Conflict

2.5 Measurement

2.5.1 Interests in Facebook

2.5.2 Faultline distance

2.5.3 Faultline strength

3 Case study (Spain)

3.1 Traces of recent conflicts in Spain

3.1.1 Basque region conflict

3.1.2 Catalonia region conflict

3.1.3 Insular effects in Spain

4 Methodology

4.1 Spain as a supergroup

4.2 Application of marketing data

5 Dataset

5.1 Representativeness of the data

6 Results and discussion

6.1 Distance analysis

6.2 Faultline strength

6.3 Singularity analysis

6.4 Variables which may affect faultline measurement

7 Conclusion

7.1 Limitations

Availability of data and materials

Notes

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords