Skip to main content

The dynamics of collective social behavior in a crowd controlled game


Despite many efforts, the behavior of a crowd is not fully understood. The advent of modern communication means has made it an even more challenging problem, as crowd dynamics could be driven by both human-to-human and human-technology interactions. Here, we study the dynamics of a crowd controlled game (Twitch Plays Pokémon), in which nearly a million players participated during more than two weeks. Unlike other online games, in this event all the players controlled exactly the same character and thus it represents an exceptional example of a collective mind working to achieve a certain goal. We dissect the temporal evolution of the system dynamics along the two distinct phases that characterized the game. We find that having a fraction of players who do not follow the crowd’s average behavior is key to succeed in the game. The latter finding can be well explained by an nth order Markov model that reproduces the observed behavior. Secondly, we analyze a phase of the game in which players were able to decide between two different modes of playing, mimicking a voting system. We show that the introduction of this system clearly polarized the community, splitting it in two. Finally, we discuss one of the peculiarities of these groups in the light of the social identity theory, which appears to describe well some of the observed dynamics.

1 Introduction

Collective phenomena have been the subject of intense research in psychology and sociology since the XIX century. There are several ways in which humans gather to perform collective actions, although observations suggest that most of them require some sort of diminution of self-identity [1]. One of the first attempts to address this subject was Le Bon’s theory on the psychology of crowds in which he argued that when people are part of a crowd they lose their individual consciousness and become more primitive and emotional thanks to the anonymity provided by the group [2]. In the following decades, theories of crowd behavior such as the convergence theory, the emergent norm theory or the social identity theory emerged. These theories shifted away from Le Bon’s original ideas, introducing rationality, collective norms and social identities as building blocks of the crowd [3, 4].

The classical view of crowds as an irrational horde led researchers to focus on the study of crowds as something inherently violent, and thus, to seek for a better understanding and prediction of violence eruption, or at least, to develop some strategies to handle them [5]. However, the information era has created a new kind of crowd, as it is no longer necessary to be in the same place to communicate and take part of collective actions. Indeed, open source and “wiki” initiatives, as well as crowdsourcing and crowdworking, are some examples of how crowds can collaborate online in order to achieve a particular objective [6, 7]. Although this offers a plethora of opportunities, caution has to be taken because, as research on the psychology of crowds has shown, the group is not just the simple addition of individuals [8]. For example, it has been observed that the group performance can be less efficient than the sum of the individual performances if they had acted separately [9]. Under which conditions this happens and whether the group is more than the individuals composing it are two current challenges of utmost importance if, for instance, one wants to use crowds as a working force.

To be able to unlock the potential of collective intelligence, a deeper understanding of the functioning of these systems is needed [10]. Examples of scenarios that can benefit from further insights into crowd behavior include new ways to reach group decisions, such as voting, consensus making or opinion averaging, as well as finding the best strategies to motivate the crowd to perform some task [11]. Regarding the latter, as arbitrary tasks usually are not intrinsically enjoyable, to be able to systematically execute crowdsourcing jobs, some sort of financial compensation is used [12]. This, however, implies dealing with new challenges, since many experiments have demonstrated that financial incentives might undermine the intrinsic motivation of workers or encourage them to only seek for the results that are being measured, either by focusing only on them or by free-riding [13,14,15]. A relevant case is given by platforms such as Amazon’s Mechanical Turk, that allow organizations to pay workers that perform micro-tasks for them, and that have already given rise to interesting questions about the future of crowd work [16]. In particular, its validity to be used for crowdsourcing behavioral research has been recently called into question [17].

Notwithstanding the previous observations, it is possible to find tasks that are intrinsically enjoyable by the crowd due to their motivational nature, which is ultimately independent of the reward [15]. This is one of the basis of online citizen science. In these projects, volunteers contribute to analyze and interpret large datasets which are later used to solve scientific problems [18]. To increase the motivation of the volunteers, some of these projects are shaped as computer games [19]. Examples range from the study of protein folding [20] to annotating people within social networks [21] or identifying the presence of cropland [22].

It is thus clear that to harness the full potential of crowds in the new era, we need a deeper understanding of the mechanisms that drive and govern the dynamics of these complex systems. To this aim, here we study an event that took place in February 2014 known as Twitch Plays Pokémon (TPP). During this event, players were allowed to control simultaneously the same character of a Pokémon game without any kind of central authority. This constituted an unprecedented event because in crowd games each user usually has its own avatar and it is the common action of all of them what produces a given result [23]. Due to its novelty, in the following years it sprouted similar crowd controlled events such as The Button in 2015 [24] or Reddit r/place in 2017 [25, 26]. Similarly to those which came after it, TPP was a completely crowd controlled process in which thousands of users played simultaneously for 17 days, with more than a million different players [27]. TPP is specially interesting because it represents an out of the lab social experiment that became extremely successful based only on its intrinsic enjoyment and, given that it was run without any scientific purpose in mind, it represents a natural, unbiased (i.e., not artificially driven) opportunity to study the evolution and organization of crowds.

2 Description of the event

On February 12, 2014, an anonymous developer started to broadcast a game of Pokémon Red on the streaming platform Twitch [27]. Pokémon Red was the first installment of the Pokémon series, which is the most successful role playing game (RPG) franchise of all time [28]. The purpose of the game was to capture and train creatures known as Pokémons in order to win increasingly difficult battles based on classical turn-based combats. However, as Pokémon Go showed in the summer of 2016, the power of the Pokémon franchise goes beyond the classical RPG games and is still able to attract millions of players [29].

On the other hand, Twitch is an online service for watching and streaming digital video broadcast. Its content is mainly related to video games: from e-sports competitions to professional players games or simply popular individuals who tend to gather large audiences to watch them play, commonly known as streamers. Due to the live nature of the streaming and the presence of a chat window where viewers can interact among each other and with the streamer, the relationship between the media creator and the consumer is much more direct than in traditional media [30]. Back in February 2014, Twitch was the 4th largest source of peak internet traffic in the US [31] and nowadays, with over 100 million unique users, it has become the home of the largest gaming community in history [32].

The element that distinguished this stream from the rest was that the streamer did not play the game. Instead, he set up a bot in the chat window that accepted some predefined commands and forwarded them to the input system of the video game. Thus, anyone could join the stream and control the character by just writing one of those actions in the chat. Although all actions were sent to the video game sequentially, it could only perform one at a time. As a consequence, all commands that arrived while the character was performing a given action (which takes less than a second) did not have any effect. Thus, it was a completely crowd controlled game without any central authority or coordination system in place. This was not a multiplayer game, this was something different, something new [33].

Due to its novelty, during the first day the game was mainly unknown with only a few tens of viewers/players and as a consequence little is known about the game events of that day [34]. However, on the second day it started to gain viewers and quickly went viral, see Fig. 1. Indeed, it ramped up from 25,000 new players on day 1 (note that the time was recorded starting from day 0 and thus day 1 in game time actually refers to the second day on real time) to almost 75,000 on day 2 and an already stable base of nearly 10,000 continuous players. Even though there was a clear decay on the number of new users after day 5, the event was able to retain a large user base for over two weeks. This huge number of users imposed a challenge on the technical capabilities of the system, which translated in a delay of between 20 and 30 seconds between the stream and the chat window. That is, users had to send their commands based on where the player was up to 30 seconds ago.

Figure 1
figure 1

New users per day. The histogram is fitted to a gamma distribution of parameters \(\alpha =2.66\) and \(\beta =0.41\). Note that this reflects those users who inputted at least one command, not the number of viewers. In the inset we show the total number of users who sent at least 1 message each hour, regardless on whether they were new players or not

Although simple in comparison to modern video games, Pokémon Red is a complex game which can not be progressed effectively at random. In fact, a single player needs, on average, 26 hours to finish the game [35]. Nevertheless, only 7 commands are needed to complete the game. There are 4 movement commands (up, right, down and left), 2 actions commands (a and b, accept and back/cancel) and 1 system button (start which opens the game’s menu). As a consequence the gameplay is simple. The character is moved around the map using the four movement commands. If you encounter a wild Pokémon you will have to fight it with the possibility of capturing it. Then, you will have to face the Pokémons of trainers controlled by the machine in order to obtain the 8 medals needed to finish the game. The combats are all turn-based so that time is not an important factor. In each turn of a combat the player has to decide which action to take for which the movement buttons along with a and b are used. Once the 8 medals have been collected there is a final encounter after which the game is finished. This gameplay, however, was much more complex during TPP due to the huge number of players sending commands at the same time and the lag present in the system.

A remarkable aspect of the event is that actions that would usually go unnoticed, such as selecting an object or nicknaming a Pokémon, yielded unexpected outcomes due to the messy nature of the gameplay. The community embraced these outcomes and created a whole narrative around them in the form of jokes, fan art and even a religion-like movement based on the judeo-christian tradition [36] both in the chat window and in related media such as Reddit. Although these characteristics of the game are outside of the scope of the paper, we believe that it would be interesting to address them as an example of the evolution of naming conventions and narrative consensus [37].

To conclude this description, we need to address the changes introduced on the sixth day as they made this already interesting dynamics even richer. After the crowd had been stuck in a movement based puzzle for almost 24 hours, the developer took down the stream to change the code. Fifteen minutes later the stream was back online but this time commands were not executed right away. Instead, they were added up and every 5 seconds the most voted command was executed. However, the crowd did not like this system and started to protest by sending start9 which would open and close the menu repeatedly impeding any movement. This riot, as it was called, forced the developer to revert the stream to its original rules (see Fig. 2). However, two hours later the system was modified again. Two new commands were added: democracy and anarchy, which controlled some sort of tug-of-war voting system over which rules to use. If the fraction of people voting for democracy went over a given threshold, the game would start to tally up votes about which action to take next. If not, the game would be played using the old rules. This system split the community into “democrats” and “anarchists” who would then fight for taking control of the game while trying to progress.

Figure 2
figure 2

Command distribution after the first introduction of the voting system. Once the system was back online votes would tally up over a period of 10 seconds. After 15 minutes the system was brought down to reduce this time to 5 seconds. This, however, did not please the crowd and it started to protest. The first start9 was sent at 5d9h8m but went almost unnoticed. Few minutes after it was sent again but this time it got the attention of the crowd. In barely 3 minutes it went from 4 start9 per minute to over 300, which stalled the game for over 8 minutes. The developer brought down the system again and removed the voting system, introducing the anarchy/democracy system a few hours later

3 Results

Pokémon Red is a fairly complex game with puzzles, quests and a turn-based combat system underlying the whole game. Even more, the introduction of the anarchy/democracy modes completely reshaped the whole dynamics of the game. As such, it is not possible to properly study the behavior of the crowd during the whole game without taking into account the particular challenge it faced at each moment. For this reason, in the following, we will focus on two particular aspects of the game. In the first subsection, we will study the event know as “the ledge”. This is the name the crowd gave to a small area that is extremely easy to finish for a single player but that represented a hard challenge for them, needing 15 hours to finish it. Furthermore, in this area, the optimal strategy is quite clear. These two factors, namely, a clear strategy and the long time they were stuck there, added to the fact that this took place before the introduction of the anarchy/democracy modes, makes this area a great case study of the collective behavior of the crowd at short timescales. Then, on the second subsection, we will analyze the political movements that evolved within the game, once the anarchy/democracy modes were introduced, giving us information about the behavior of the crowd at longer timescales.

3.1 The ledge

On the third day of the game, the character arrived to the area depicted in Fig. 3 (note that the democracy/anarchy system had not been introduced yet). Each node of the graph represents a tile of the game. The character starts on the yellow node on the left part of the network and has to exit through the right part, an event that we will define as getting to one of the yellow nodes on the right. The path is simple for an average player but it represented a challenge for the crowd due to the presence of the red nodes. These nodes represent ledges which can only be traversed going downwards, effectively working as a filter that allows flux only downwards. Thus, one good step will not cancel a bad step, as the character would be trapped down the ledge and will have to find a different path to go up again. For this reason, this particular region is highly vulnerable to actions deviating from the norm, either caused by mistake or performed intentionally by griefers, i.e., individuals whose only purpose is to annoy other players and who do so by using the mechanisms provided by the game itself [38, 39] (note that in social contexts these individuals are usually called trolls [40]). Indeed, there are paths (see blue nodes in Fig. 3) where only the command right is needed and which are next to a ledge so that the command down, which is not needed at all, will force the crowd to go back and start the path again. Additionally, the existence of the lag described in Sect. 2 made this task even more difficult.

Figure 3
figure 3

Network representation of the ledge area. It is possible to go from a node to the ones surrounding it using the commands up, right, down and left. The only exception are the red nodes which are ledges. If the character tries to step on one of those nodes it will be automatically sent to the node right below it, characteristic that is represented by the curved links connecting nodes above and below ledges. Yellow nodes mark the entrance and exit of the area and blue nodes highlight the most difficult part of the path. Note that as the original map was composed by squared tiles this network representation is not an approximation but the exact shape of the area

In Fig. 4(a) we show the time evolution of the amount of messages containing each command (the values have been normalized to the total number of commands sent each minute) since the beginning of this part until they finally exited. First, we notice that it took the crowd over 15 hours to finish an area that can be completed by an optimal walk in less than 2 minutes. Then, we can clearly see a pattern from 2d18h30m to the first time they were able to reach the nodes located right after the blue ones, approximately 3d01h10m: when the number of rights is high the number of lefts is low. This is a signature of the character trying to go through the blue nodes by going right, falling down the ledge, and going left to start over. Once they finally reached the nodes after the blue path (first arrival) they had to fight a trainer controlled by the game, combat which they lost and as a consequence the character was transported outside of the area and they had to enter and start again from the beginning. Again, we can see a similar left-right pattern until they got over that blue path for the second time, which in this case was definitive.

Figure 4
figure 4

Study of the Ledge event. (a) Time evolution of the fraction of commands sent each minute. Note that a single player should be able to finish this area in a few minutes, but the crowd needed 15 hours. The time series has been smoothed using moving averages. (b) Hierarchical clustering of the time series of each group of users (see main text for details). (c) Left: Mean time needed to exit the area according to our simulations as a function of the fraction of griefers in the system and the noise in it. Right: 1% quantile of the time needed to exit the area, note that the y axis is given in minutes instead of hours

The ledge is a great case study of the behavior of the crowd because the mechanics needed to complete it is very simple (just moving from one point to another), which facilitates the analysis. At the same time, it took the players much longer to finish this area than what is expected for a single player. To address all these features, we propose a model aimed at mimicking the behavior of the crowd. Specifically, we consider a nth order Markov Chain so that the probability of going from state \(x_{m}\) to \(x_{m+1}\) depends only on the state \(x_{m-n}\), thus accounting for the effect of the lag of the dynamics. Furthermore, the probabilities of going from one state to another will be set according to the behavior of the players in the crowd.

To define these probabilities, we first classify the players in groups according to the total number of commands they sent in this period: G1, users with 1 or 2 commands (46% of the users); G2, 3 or 4 commands (18%); G3, between 5 and 7 commands (13%); G4, between 8 and 14 commands (12%); G5, between 15 and 25 commands (6%); and G6, more than 25 commands (5%). These groups were defined so that the total number of messages sent by the first three is close to 50,000 and 100,000 for the other three. Interestingly, the time series of the inputs of each of these groups are very similar (see Additional files 17). Actually, if we remove the labels of the 42 time series and cluster them using the euclidean distance, we obtain 7 clusters, one for each command. Even more, the time series of each of the commands are clustered together, Fig. 4(b). In other words, the behavior of users with medium and large activities are not only similar to each other, but they are also equivalent to the ones coming from the aggregation of the users who only sent 1 or 2 commands. This allows us to infer the behavior of the whole crowd by looking at the bahavior of the most active players, group 6.

In our Markovian model, if we set the probabilities so that the next state in the transition is always the one that gets you closer to the exit but with 25 seconds of delay (that is, the probability of going from state \(x_{m}\) to \(x_{m+1}\) is the probability of going from \(x_{m-n}\) to the state which would follow the optimal path at \(x_{m-n+1}\)), the system gets stuck in a loop and is never able to reach the exit. However, that would require all players to be sending exactly the same command at the same time, something that is not seen in the data nor expected in a (uncontrolled) crowd. Thus, we consider that each time step there are 100 users with different behaviors introducing commands. In particular, we consider variable quantities of noisy users who play completely at random, griefers who only press down to annoy the rest of the crowd and the herd who always sends the optimal command to get to the exit. The results, Fig. 4(c), show that the addition of noise to the herd breaks the loops and allows the crowd to get to the exit (similar results are obtained for either 20 or 30 seconds of lag, see Additional file 8). In particular, for the case with no griefers we find that with 1 percent of users adding noise to the input the mean time needed to finish this part is almost 3,000 hours. However, as we increase the noise, time is quickly reduced with an optimal noise level of around 40% of the crowd. Conversely, the introduction of griefers in the model, as expected, increases the time needed to finish this part in most cases. Interestingly though, for low values of the noise, the addition of griefers can actually be beneficial for the crowd. Indeed, by breaking the herding effect, these players are unintentionally helping the crowd to reach their goal.

Whether the individuals categorized as “noise” were producing it unintentionally or doing it on purpose to disentangle the crowd (an unknown fraction of users were aware of the effects of the lag and they tried to disentangle the system [41]) is something we can not analyze because, unfortunately, the resolution of the chat log in this area is in minutes and not in seconds. We can, however, approximate the fraction of griefers in the system thanks to the special characteristics of this area. Indeed, as most of the time the command down is not needed—on the contrary, it would destroy all progress—, we can categorize those players with an abnormal number of downs as griefers. To do so, we take the users that belong to G6 (the most active ones) and compare the fraction of their inputs that corresponds to down between each other. We find that 7% have a behavior that could be categorized as outlier (the fraction of their input corresponding to down is higher than 1.5 times the inter quartile range). More restrictively, for 1% of the players, the command down represents more than half of their inputs. Both these values are compatible with the observed time according to our model, even more if we take into account that the model is more restrictive as we consider that griefers continuously press down (not only near the blue nodes). Thus, we conclude that users deviating from the norm, regardless of being griefers, noise or even very smart individuals, were the ones that made finishing this part possible.

3.2 Anarchy vs. democracy

As already described, on the sixth day of the game the input system was modified. This resulted in the start9 riot that led to the introduction of the anarchy/democracy system. From this time on, if the fraction of users sending democracy, out of the total amount of players sending the commands anarchy or democracy, went over 0.75 (later modified to 0.80) the game would enter into democracy mode and commands would be tallied up for 5 seconds. Then, the meter had to go below 0.25 (later modified to 0.50) to enter into anarchy mode again. Note that these thresholds were set by the creator of the experiment.

The introduction of the voting system was mainly motivated by a puzzle where the crowd had been stuck for over 20 hours with no progress. Nonetheless, even in democracy mode, progress was complex as it was necessary to retain control of the game mode plus taking into account lag when deciding which action to take. Actually, the tug-of-war system was introduced at the middle of day 5, yet the puzzle was not fully completed until the beginning of day 6, over 40 hours after the crowd had originally arrived to the puzzle. One of the reasons why it took so long to finish it even after the introduction of the voting system is that it was very difficult to enter into democracy mode. Democracy was only “allowed” by the crowd when they were right in front of the puzzle and they would go into anarchy mode quickly after finishing it. Similarly, the rest of the game was mainly played under anarchy mode. Interestingly, though, we find that there were more “democrats” in the crowd (players who only voted for democracy) than “anarchists” (players who only voted for anarchy). Out of nearly 400,000 players who participated in the tug-of-war throughout the game, 54% were democrats, 28% anarchists and 18% voted at least once for both of them. Therefore, the introduction of this new system did not only split the crowd into two polarized groups with, as we shall see, their own norms and behaviors, but also created non trivial dynamics between them.

To explore the dynamics of these two groups, we next compare two different days: day 6 and day 8. Day 6 was the second day after the introduction of the anarchy/democracy dynamics and there were not any extremely difficult puzzles or similar areas where democracy might have been needed. On the other hand, day 8 was the day when the crowd arrived to the safari zone, which certainly needed democracy mode since the available number of steps in this area is limited (see description of Additional file 9). We must note that, contrary to what we observed in Sect. 3.1, in this case commands coming from low activity users are not equivalent to the ones coming from high activity users. In particular, low activity users tend to vote much more for democracy (see Additional files 9 and 10). As such, it would not be adequate to remove them from the analysis. Our results are summarized in Fig. 5.

Figure 5
figure 5

Politics of the crowd. Days 6 (top) and 8 (bottom). In every plot the gray color represents when the game was played under anarchy rules and the blue color when it was played under democracy rules. The polar plots represent the evolution of the fraction of votes corresponding to anarchy/democracy while distinguishing if the user previously voted for anarchy or democracy: first quadrant, votes for anarchy coming from users who previously voted for anarchy (\({A} \rightarrow {A}\)); second quadrant, votes for democracy coming from anarchy (\({A}\rightarrow {D}\)); third quadrant, votes for democracy coming from democracy (\({D}\rightarrow {D}\)); fourth quadrant, votes for anarchy coming from democracy (\({D}\rightarrow {A}\)). In the other plots we show the evolution of the total number of votes for anarchy or democracy as a function of time normalized by its maximum value (pink) as well as the position of the tug-of-war meter (black). When the meter goes above 0.75 the system enters into democracy mode (blue) until it reaches 0.25 (these thresholds were later changed to 0.80 and 0.50 respectively) when it enters into anarchy mode (gray) again. The gap in the pink curve of picture (d) is due to the lack of data in that period (see Availability of data and materials)

One of the most characteristic features of groups is their polarization [42, 43]. The problem in the case we are studying is that as players were leaving the game while others were constantly coming in, it is not straightforward to measure polarization. The fact that the number of votes for democracy could increase at a given moment did not mean that anarchists changed their opinion, it could be that new users were voting for democracy or simply that players who voted for anarchy stopped voting. Then, to properly measure polarization we consider 4 possible states for each user. They are defined by both the current vote of the player and the immediately previous one (note that we have removed players who only voted once, but this does not affect the measure of the position of the meter, see Additional file 9): \(A\rightarrow A\), first anarchy then anarchy; \(A \rightarrow D\), first anarchy then democracy; \(D \rightarrow D\), first democracy then democracy; \(D \rightarrow A\), first democracy then anarchy. As we can see in Figs. 5(A) and 5(C) the communities are very polarized, with very few individuals changing their votes. The fraction of users changing from anarchy to democracy is always lower than 5%, which indicates that anarchists form a very closed group. Similarly, the fraction of users changing from democracy to anarchy is also very low, although there are clear bursts when the crowd exits the democracy mode. This reflects that those who changed their vote from anarchy to democracy do so to achieve a particular goal, such as going through a mace, and once they achieve the target they instantly lose interest in democracy.

With such degree of polarization the next question is how was it possible for the crowd to change from one mode to the other. To do so, we shift our attention to the number of votes. In Fig. 5(B) we can see that every time the meter gets above the democracy threshold it is preceded by an increase in the total number of votes. Then, once under democracy mode the total number of votes decays very fast. Finally, there is another increment before entering again into anarchy mode. Thus, it is clear that every time democrats were able to enter into their mode they stopped voting and started playing. This let anarchists regain control even though they were less users, leading to a sharp decay of the tug of war meter. Once they exited democracy mode, democrats started to vote again to try to set the game back into democracy mode. In Fig. 5(D) we can see initially a similar behavior in the short periods when democracy was installed. However, there is a wider area were the crowd accepted the democracy, this marks the safari zone mentioned previously. Interestingly, we can see how democrats learned how to keep their mode active. Initially there was the same drop on users voting and on the position of the meter seen in the other attempts. This forced democrats to keep voting instead of playing, which allowed them to retain control for longer. Few minutes later the number of votes decays again but in this case the position of the meter is barely modified probably due to anarchists finally accepting that they needed democracy mode to finish this part. Even though they might have implicitly accepted democracy, it is worth noting that the transitions \(A \rightarrow D\) are minimum (Fig. 5(C)). Finally once the mission for which the democracy mode was needed finished, there is a sharp increment in the fraction of transitions \(D \rightarrow A\).

4 Discussion

Probably the first thing that comes to ones mind when thinking on how progressing was possible in a scenario like this is the famous experiment by Francis Galton in which he asked a crowd to guess the weight of an ox. He found that the average of all estimates of the crowd was just 0.8% higher than the real weight [44]. Indeed, when lots of users were playing, the extreme answers would cancel each other and the character would tend to move towards the most common command sent by the crowd. Note, however, that as most of the time they were not voting, actions deviating from the mean could also be performed by pure chance, as we saw in Sect. 3.1.

It is worth stressing that, to form a classical wise crowd, some important elements are needed, such as independence [45]. That is, the answer of each individual should not be influenced by other people in the crowd. In our case, this was not true, as the character was continuously moving. Indeed, one of the main features of this crowd event is that opinions had effect in real time, and hence, people could see the tendency of the crowd and change its behavior accordingly. Theoretical [46] and empirical studies [47] have shown that a minority of informed individuals can lead a naïve group of animals or humans to a given target in the absence of direct communication. This was observed in Sect. 3.1, as it was strictly necessary to introduce individuals who would destroy the herding effect in order to correctly direct the crowd. Besides, even in the case of conflict in the group, the time taken to reach the target is not increased significantly [47] which would explain why, even with the presence of griefers, it only took the crowd 10 times more to finish the game than the average person. Although this ratio may appear to be high, it is not: the crowd got stuck in some parts of the game for over a day, increasing the time to finish. If those parts were excluded, the game progress can be considered to be remarkably fast, despite the messy nature of the gameplay.

As a matter of fact, the movement of the character on the map can be probably better described as a swarm rather than as a crowd. Classical collective intelligence, such as the opinions of a crowd obtained via polls or surveys, has the particularity stated previously of independence and, in addition, being asynchronous. Even more, it has been shown that when users can influence each other but still in an asynchronous way, the group decisions are distorted by social biasing effects [48]. Recently, it has been proposed that the use of structures similar to natural swarms can correct some of these problems [49]. Indeed, by allowing users to participate in decision making processes in real time with feedback about what the rest is doing, in some sort of human swarm, it is possible to explore more efficiently the decision space and reach more accurate predictions than with simple majority voting [50]. Admittedly, it has recently been suggested that online crowds might be better described as swarms, that is, as something in-between crowds and networks [51].

Regarding the results of Sect. 3.1, we have not discussed yet the reasons why the behavior of individuals with few messages was so similar to the ones of the most active players. In this context we could argue that users with few messages tend to act intuitively as they soon lose interest. According to the social heuristics hypothesis [52], fast decisions tend to increase cooperation, which in this case would mean trying to get out of the area as fast as possible. Similarly, experiments have shown that people with prosocial predispositions tend to act that way when they have to make decisions quickly [53]. Thus, users that send few commands might tend to send the ones that get the character closer to the exit, which would explain why without being aware of it, they behave as those users that tried to progress for longer. However, we saw that coordination might not be so desirable in this occasion. The problem with players conforming with the majoritarian direction or mimicking each other is that they can be subject to herding effects [54, 55] which in this particular setting was catastrophic due to the lag present in the system. Indeed, as we have shown, the addition of players not following the actions of the group, regardless of their motivations to do so, was the key element that allowed to complete the ledge.

Another interesting aspect of the game was the introduction of the anarchy/democracy dynamics. One of the first questions that arises is what might have motivated players to join into one group or the other. From a broad perspective, it has been proposed that one of the key ingredients behind video game enjoyment is the continuous perception of one’s causal effects on the environment, also known as effectance [56], thanks to their immediate response to player inputs. In contrast, it has been observed that a reduction of control, defined as being able to influence the dynamics according to one’s goals, does not automatically lower enjoyment [57]. This might explain why some people preferred anarchy. Under their rules, players saw that the game was continuously responding to inputs, even if they were not exactly the ones they sent. On the other hand, with democracy, control was higher at the expense of effectance, as the game would only advance once every 5 seconds. The fact that some people might have preferred this mode is not surprising as it is well known that different people might enjoy different aspects of a game [58]. In the classical player classification proposed by Bartle [59] for the context of MUDs (multi-user dungeon, which later evolved into what we now today as MMORPGs—massively multiplayer online role-playing games) he already distinguished four types of players: achievers, who focus on finishing the game (who in our context could be related to democrats); explorers, who focused on interacting with the world (anarchists); socializers, who focused on interacting with other players (those players who focused on making fan art and developing narratives); and killers, whose main motivation was to kill other players (griefers). Similarly, it has been seen in the context of FPSs (first person shooters) that player-death events, i.e., loosing a battle, can be pleasurable for some players (anarchists) while not for others (democrats) [60].

However, when addressing the subject of video games entertainment, it is always assumed that the player has complete control over the character, regardless of whether it is a single player game or a competitive/cooperative game. TPP differs from those cases in the fact that everyone controlled the same character. As a consequence, enjoyment is no longer related to what a player, as a single individual, has done but rather to what they, as a group, have achieved. From the social identity approach perspective this can be described as a shift from the personal identity to the group identity. This shift would increase conformity to the norms associated to each group but as the groups were unstructured their norms would be inferred from the actions taken by the rest of the group [3]. New group members would then perform the actions they saw appropriate for them as members of the group, even if they might be seen as antinormative from an outside perspective [61]. This can be clearly seen in the behavior of the anarchists. Indeed, every time the game entered in democracy mode, anarchists started to send start9 as a form of protest, hijacking the democracy. Interestingly, this kept happening even though most of the players who were in the original protest did not play anymore (see Fig. 6). Thus, newcomers adopted the identity of the group even if they had not participated in its conception. Even more, stalling the game might have been regarded as antisocial behavior from the own anarchists point of view when they were playing under anarchy rules, but when the game entered into democracy mode it suddenly turned into an acceptable behavior, as predicted by the theory.

Figure 6
figure 6

start9 protests throughout the game. Left: fraction of input corresponding to the start9 command. Right: fraction of users sending start9 who where in the original start9 riot (inset, total number of protesters each day). There were start9 protests 10 days after the first one even though less than 10% of the protesters had been part of the first one

Taken into account all the aspects of this event that are completely different from other online experiences, we would like to conclude discussing the reasons why this game might have attracted so many people. The game was disordered, progress was much slower than if played individually (see Sect. 3.1) and often really bad actions were performed. An example of one of those actions is what came to be known as the bloody Sunday. On Sunday 23 of February (day 10 in game time) the crowd captured one of the strongest Pokémon. However, in order to add it to the team they had to go to a special place where you can withdraw and release Pokémon. Unfortunately, even though they were able to withdraw said Pokémon, they ended up releasing 12, some of them being the strongest ones in the team. The frustration this caused in the crowd can be clearly seen in Fig. 7(A) where we plot the average amount of o in the word no in the messages sent by the crowd as a function of time. There is a clear increment in the average number of o in day 10 which lasts for the rest of the event. Even more, it is also possible to see bursts of o at certain points of the game which tells us the players were continuously facing frustrating events. Similarly, if we look at the amount of messages containing why per hour, it is clear that some of the actions taken by the crowd were not understood by some of its members. All these examples can be considered as indirect empirical measures of the amount of frustration held in the crowd.

Figure 7
figure 7

Measures of frustration. (A) Players expressed their frustration by adding more times the letter o when they wanted to say no. Even though frustration was present throughout the event, it was incremented after the events of bloody Sunday. (B) Distribution of the number of o. Interestingly, the relationship is not linear as the word noo tends to appear less than nooo or noooo, which indicates that when players were frustrated they overexpressed it. (C) Number of messages containing the word “why” per hour. This indicates that many players did not understand the movements of the crowd, which probably made them feel frustrated

Even though usually frustration has a negative connotation, in the context of games it has been observed that frustration and stress can be pleasurable as they motivate players to overcome new challenges [62]. Actually, there is a whole game genre known as “masocore” (a portmanteau of masochism and hardcore) which consists of games with extremely challenging gameplay built with the only purpose of generating frustration on the players [63]. Similarly, there are games which might be simpler but that have really difficult controls and strange physics, such as QWOP, Surgeon Simulator or Octodad, which are also built with the sole aim of generating frustration [64]. Thus, the mistakes performed by the crowd might have not been something dissatisfactory but completely the opposite, they might have been the reason why this event was so successful [65].



Twitch Plays Pokémon


Role Playing Game


Multi-User Dungeon


Massively Multiplayer Online Role Playing Games


first person shooter


  1. Abrams D, Hogg MA (2001) Collective identity: group membership and self-conception. In: Blackwell handbook of social psychology: group processes, pp 425–460

    Google Scholar 

  2. Le Bon G (1895) The crowd: a study of the popular mind

    Google Scholar 

  3. Reicher SD (2008) The psychology of crowd dynamics. In: Blackwell handbook of social psychology: group processes

    Google Scholar 

  4. La Macchia ST, Louis WR (2016) Crowd behaviour and collective action. In: Understanding peace and conflict through social identity theory. Springer, Cham, pp 89–104

    Google Scholar 

  5. Reicher S, Stott C, Cronin P, Adang O (2004) An integrated approach to crowd psychology and public order policing. Policing, Int J Police Strateg Manag 27(4):558–572

    Article  Google Scholar 

  6. Kozinets RV, Hemetsberger A, Schau HJ (2008) The wisdom of consumer crowds: collective innovation in the age of networked marketing. J Macromark 28(4):339–354

    Article  Google Scholar 

  7. von Ahn L, Maurer B, McMillen C, Abraham D, Blum M (2008) reCAPTCHA: human-based character recognition via web security measures. Science 321(5895):1465–1468

    Article  MathSciNet  Google Scholar 

  8. Baumeister RF, Ainsworth SE, Vohs KD (2016) Are groups more or less than the sum of their members? The moderating role of individual identification. Behav Brain Sci 39:137

    Article  Google Scholar 

  9. Latané B (1981) The psychology of social impact. Am Psychol 36(4):343

    Article  Google Scholar 

  10. Quinn AJ, Bederson BB (2011) Human computation: a survey and taxonomy of a growing field. In: Proceedings of the SIGCHI conference on human factors in computing systems. CHI ’11. ACM, New York, pp 1403–1412

    Google Scholar 

  11. Malone TW, Laubacher R, Dellarocas C (2010) The collective intelligence genome. MIT Sloan Manag Rev 51(3):21

    Google Scholar 

  12. Mason W, Watts DJ (2010) Financial incentives and the performance of crowds. ACM SIGKDD Explor Newsl 11(2):100–108

    Article  Google Scholar 

  13. Prendergast C (1999) The provision of incentives in firms. J Econ Lit 37(1):7–63

    Article  Google Scholar 

  14. Heyman J, Ariely D (2004) Effort for payment: a tale of two markets. Psychol Sci 15(11):787–793

    Article  Google Scholar 

  15. Gneezy U, Rustichini A (2000) Pay enough or don’t pay at all. Q J Econ 115(3):791–810

    Article  Google Scholar 

  16. Kittur A, Nickerson JV, Bernstein M, Gerber E, Shaw A, Zimmerman J, Lease M, Horton J (2013) The future of crowd work. In: Proceedings of the 2013 conference on computer supported cooperative work. CSCW ’13. ACM, New York, pp 1301–1318

    Google Scholar 

  17. Peer E, Brandimarte L, Samat S, Acquisti A (2017) Beyond the turk: alternative platforms for crowdsourcing behavioral research. J Exp Soc Psychol 70:153–163

    Article  Google Scholar 

  18. Cox J, Oh EY, Simmons B, Lintott C, Masters K, Greenhill A, Graham G, Holmes K (2015) Defining and measuring success in online citizen science: a case study of zooniverse projects. Comput Sci Eng 17(4):28–41

    Article  Google Scholar 

  19. von Ahn L (2006) Games with a purpose. Computer 39(6):92–94

    Article  Google Scholar 

  20. Khatib F, Cooper S, Tyka MD, Xu K, Makedon I, Popović Z, Baker D, Players F (2011) Algorithm discovery by protein folding game players. Proc Natl Acad Sci USA 108(47):18949–18953

    Article  Google Scholar 

  21. Bernstein M, Tan D, Smith G, Czerwinski M, Horvitz E (2009) Collabio: a game for annotating people within social networks. In: UIST ’09

    Google Scholar 

  22. Salk CF, Sturn T, See L, Fritz S, Perger C (2016) Assessing quality of volunteer crowdsourcing contributions: lessons from the Cropland Capture game. Int J Digit Earth 9(4):410–426

    Article  Google Scholar 

  23. Birke A, Schoenau-Fog H, Reng L (2012) Space bugz!: a smartphone-controlled crowd game. In: Proceeding of the 16th international academic MindTrek conference. MindTrek ’12. ACM, New York, pp 217–219

    Chapter  Google Scholar 

  24. Description of The Button event (2019)

  25. Müller TF, Winters J (2018) Compression in cultural evolution: homogeneity and structure in the emergence and evolution of a large-scale online collaborative art project. PLoS ONE 13(9):0202019

    Article  Google Scholar 

  26. Rappaz J, Catasta M, West R, Aberer K Latent structure in collaboration: the case of Reddit R/place. arXiv:1804.05962

  27. Guinness World Records 2015 Gamer’s Edition (2014) Guinness book

  28. Pokémon passes 300 million games sold (2018)

  29. Althoff T, White RW, Horvitz E (2016) Influence of Pokémon Go on physical activity: study and implications. J Med Internet Res 18(12):315

    Article  Google Scholar 

  30. Sjöblom M, Hamari J (2017) Why do people watch others play video games? An empirical study on the motivations of Twitch users. Comput Hum Behav 75:985–996

    Article  Google Scholar 

  31. Twitch is 4th in Peak US Internet Traffic (2018)

  32. Churchill BCB, Xu W (2016) The modem nation: a first study on Twitch.TV social structure and player/game relationships. In: 2016 IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (SocialCom), sustainable computing and communications (SustainCom) (BDCloud-SocialCom-SustainCom), pp 223–228

    Chapter  Google Scholar 

  33. Flynn-Jones E (2015) Well played, vol 3. ETC Press, Pittsburgh

    Google Scholar 

  34. Twitch Plays Pokémon timeline (2018)

  35. Time needed to finish Pokémon Red (2018)

  36. Lindsey M-V (2015) Religion in digital games relodaded, vol 7. Institute for Religious Studies, University of Heidelberg, Heidelberg, Chap. 6

    Google Scholar 

  37. Centola D, Baronchelli A (2015) The spontaneous emergence of conventions: an experimental study of cultural evolution. Proc Natl Acad Sci USA 112(7):1989–1994

    Article  Google Scholar 

  38. Kirman B, Lineham C, Lawson S (2012) Exploring mischief and mayhem in social computing or: how we learned to stop worrying and love the trolls. In: CHI ’12 extended abstracts on human factors in computing systems. ACM, New York, pp 121–130

    Google Scholar 

  39. Paul HL, Bowman ND, Banks J (2015) The enjoyment of griefing in online games. J Gaming Virtual Worlds 7(3):243–258

    Article  Google Scholar 

  40. Buckels EE, Trapnell PD, Paulhus DL (2014) Trolls just want to have fun. Pers Individ Differ 67:97–102

    Article  Google Scholar 

  41. A strategy to traverse the ledge (2019)

  42. Sunstein CR (1999) The law of group polarization. J Polit Philos 10(2):175–195

    Article  Google Scholar 

  43. Conover M, Ratkiewicz J, Francisco MR, Gonçalves B, Menczer F, Flammini A (2011) Political polarization on Twitter. ICWSM 133:89–96

    Google Scholar 

  44. Galton F (1907) Vox populi (the wisdom of crowds). Nature 75(7):450–451

    Article  Google Scholar 

  45. Surowiecki J (2004) The wisdom of crowds: why the many are smarter than the few and how collective wisdom shapes business, economies, societies and nations, vol 296. Anchor Books, New York

    Google Scholar 

  46. Couzin ID, Krause J, Franks NR, Levin SA (2005) Effective leadership and decision-making in animal groups on the move. Nature 433(7025):513

    Article  Google Scholar 

  47. Dyer JR, Ioannou CC, Morrell LJ, Croft DP, Couzin ID, Waters DA, Krause J (2008) Consensus decision making in human crowds. Anim Behav 75(2):461–470

    Article  Google Scholar 

  48. Muchnik L, Aral S, Taylor SJ (2013) Social influence bias: a randomized experiment. Science 341(6146):647–651

    Article  Google Scholar 

  49. Rosenberg LB (2015) Human swarming, a real-time method for parallel distributed intelligence. In: 2015 swarm/human blended intelligence workshop (SHBI), pp 1–7

    Google Scholar 

  50. Rosenberg L, Baltaxe D, Pescetelli N (2016) Crowds vs swarms, a comparison of intelligence. In: 2016 swarm/human blended intelligence workshop (SHBI), pp 1–4

    Google Scholar 

  51. Lee RLM (2017) Do online crowds really exist? Proximity, connectivity and collectivity. Distinktion J Soc Theory 18(1):82–94

    Article  Google Scholar 

  52. Rand DG, Peysakhovich A, Kraft-Todd GT, Newman GE, Wurzbacher O, Nowak MA, Greene JD (2014) Social heuristics shape intuitive cooperation. Nat Commun 5:3677

    Article  Google Scholar 

  53. Yamagishi T, Matsumoto Y, Kiyonari T, Takagishi H, Li Y, Kanai R, Sakagami M (2017) Response time in economic games reflects different types of decision conflict for prosocial and proself individuals. Proc Natl Acad Sci USA 114:6394–6399

    Article  Google Scholar 

  54. Kameda T, Tsukasaki T, Hastie R, Berg N (2011) Democracy under uncertainty: the wisdom of crowds and the free-rider problem in group decision making. Psychol Rev 118(1):76–96

    Article  Google Scholar 

  55. Hung AA, Plott CR (2001) Information cascades: replication and an extension to majority rule and conformity-rewarding institutions. Am Econ Rev 91(5):1508–1520

    Article  Google Scholar 

  56. White RW (1959) Motivation reconsidered: the concept of competence. Psychol Rev 66(5):297–333

    Article  Google Scholar 

  57. Klimmt C, Hartmann T, Frey A (2007) Effectance and control as determinants of video game enjoyment. CyberPsychol Behav 10(6):845–848

    Article  Google Scholar 

  58. Mekler ED, Bopp JA, Tuch AN, Opwis K (2014) A systematic review of quantitative studies on the enjoyment of digital entertainment games. In: Proceedings of the 32nd annual ACM conference on human factors in computing systems. ACM, New York, pp 927–936

    Google Scholar 

  59. Bartle R (1996) Hearts, clubs, diamonds, spades: players who suit muds

  60. van den Hoogen W, Poels K, IJsselsteijn W, de Kort Y (2012) Between challenge and defeat: repeated player-death and game enjoyment. Media Psychol 15(4):443–459

    Article  Google Scholar 

  61. Spears R, Postmes T (2015) Group identity, social influence and collective action online: extensions and applications of the SIDE model. In: Sundar S (ed) Handbooks in communication and media. Wiley–Blackwell, Chichester, pp 23–46

    Google Scholar 

  62. Nylund A, Landfors O (2015) Frustration and its effect on immersion in games: a developer viewpoint on the good and bad aspects of frustration. DIVA

  63. The rise of masocore gaming (2018)

  64. Games with Busted Physics and Controls (2018)

  65. Ramirez D, Saucerman J, Dietmeier J (2014) Twitch plays pokemon: a case study in big g games. In: Proceedings of DiGRA 2014, Snowbird, UT

    Google Scholar 

  66. Chat logs and videos of the whole event (2014)

Download references


We thank K.C. Bathina, J. Bollen, J.P. Gleeson, and M. Quayle for helpful comments and suggestions. A.A. acknowledges the support of the FPI doctoral fellowship from MINECO and its mobility scheme. Y.M. acknowledges partial support from the Government of Aragón, Spain through grant E36-17R, by MINECO and FEDER funds (grant FIS2017-87519-P) and by Intesa Sanpaolo Innovation Center. The funders had no role in study design, data collection and analysis, or preparation of the manuscript.

Availability of data and materials

All the data regarding the event is publicly available for researches in the Internet Archive [66]. The chat logs can have either seconds (YYYY-MM-DD HH:MM:SS) or minute (YYYY-MM-DD HH:MM) resolution. As discussed in the main text, the game went viral after the first day. Thus, even though the game started on February 12, 2014 at 23:16:01 UTC, the first logs recorded correspond to February 14, 2014 at 08:16:19 GMT+1. Besides, the data between February 21, 2014 at 04:25:54 GMT+1 and 07:59:22 GMT+1 is missing, as it can be seen in Fig. 5(D). Additionally, the whole event was recorded in video and is also available in the Internet Archive [66]. The position of the tug-of-war meter as well as the game mode active at each time were extracted from those videos using optical character recognition techniques.

Author information

Authors and Affiliations



AA and YM designed the research. AA performed the research. AA and YM analyzed the data and wrote the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Alberto Aleta.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

Below are the links to the electronic supplementary material.

Command a. Time series of the fraction of a in the input of each group. (PDF 165 kB)

Command b. Time series of the fraction of b in the input of each group. (PDF 168 kB)

Command start. Time series of the fraction of start in the input of each group. (PDF 164 kB)

Command up. Time series of the fraction of up in the input of each group. (PDF 175 kB)

Command right. Time series of the fraction of right in the input of each group. (PDF 171 kB)

Command down. Time series of the fraction of down in the input of each group. (PDF 163 kB)

Command left. Time series of the fraction of left in the input of each group. (PDF 165 kB)


Effect of lag. Mean time needed to exit the area according to our model with (A) 20 seconds of delay between the input and the image or (B) 30 seconds of delay between the input and the image. As stated in the main text, there was a delay in the system that ranged between 20 and 30 seconds. The amount varied depending on the number of players, the load of Twitch servers, the effect of some technical corrections that were made, etc. Thus, the particular lag at every moment is undetermined. In the main text we have used the intermediate value of 25 seconds, although as we can see in Additional file 8 the results are equivalent if we had set the lag to 20 or 30 seconds. (PDF 275 kB)


Tug of war commitment, 2 votes. Meter position of the political tug of war if only votes from committed players (those who sent at least 2 votes throughout the whole game) are taken into account (blue) and if only votes from visitors (those players who only participated once in the voting) are taken into account (pink). The introduction of the voting system was mainly motivated by a puzzle where the crowd had been stuck for over 20 hours with no progress, but it was not the only reason. It was known that later in the game the character would need to get through a special area, the safari zone, where the number of steps one can take is limited to 500. If it goes over that limit the character would then be automatically teleported outside of the area. Voting seemed the best way to go through this area because even though progress can be slow, it is only necessary half of the crowd, at most, to be coordinated. The democracy system also added a new element, the possibility of sending compound commands. Indeed, under democracy mode it was possible to concatenate up to 9 commands, either by writing one after the other or by adding a number which would repeat the previous command that number of times. For example, aleftright would be equivalent to execute a, then left and lastly right. Similarly, start9 was equivalent to pressing start nine times in a row. However, the relevance of these commands in the results was limited. First, because most of the time the game was played under anarchy mode. And second, because the total amount of messages containing those commands is much lower than the ones for the simple ones. As briefly discussed in the main text, in contrast with the ledge event, in this case the behavior of users who sent few commands differs from the ones with several commands. In Additional file 9 we show the hypothetical position of the meter if only users who sent just 1 command are taken into account and if only users with 2 or more commands are taken into account. The position of the meter matches perfectly with this second set of users. If we increase the threshold and consider only users with less than 10 commands as visitors and with 10 or more as committed players, the position of the meter still resembles the results of this second group, even though differences are now noticeable. Interestingly, though, the visitors were clearly in favor of democracy as it can be noticed from the position of their votes and the slight shift downwards of the position of the committed players votes, Additional file 10. (PDF 816 kB)


Tug of war commitment, 10 votes. Meter position of the political tug of war if only votes from committed players (those who sent at least 10 votes throughout the whole game) are taken into account (blue) and if only votes from visitors (those players who sent less than 10 votes) are taken into account (pink). (PDF 987 kB)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aleta, A., Moreno, Y. The dynamics of collective social behavior in a crowd controlled game. EPJ Data Sci. 8, 22 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: