An Experimental Study of Segregation Mechanisms

Segregation is widespread in all realms of human society. Several influential studies have argued that intolerance is not a prerequisite for a segregated society, and that segregation can arise even when people generally prefer diversity. We investigated this paradox experimentally, by letting groups of high-school students play four different real-time interactive games. Incentives for neighbor similarity produced segregation, but incentives for neighbor dissimilarity and neighborhood diversity prevented it. The participants continued to move while their game scores were below optimal, but their individual moves did not consistently take them to the best alternative position. These small differences between human and simulated agents produced different segregation patterns than previously predicted, thus challenging conclusions about segregation arising from these models.


Introduction
Even if people are generally tolerant, they can end up living in a segregated society if they have weak preferences to be among similar others. This was the counter-intuitive conclusion derived from Schelling's model of segregation [, ], one of the most influential models of tipping dynamics and unintended consequences [-]. The most common version of the model is a simulation in which agents of two different colors are placed on a grid. On each time step of the simulation, the agents move position depending on the color of their eight nearest neighbors. For example, the agents move if less than % of their neighbors have the same color as themselves. Even when everyone has these relatively mild preferences for similar neighbors and no one prefers a segregated local neighborhood, the model dynamics result in extreme segregation at the population level.
Subsequent theoretical research has shown that even when individuals actively seek diversity, segregation may still be a likely outcome [-]. These models assume that a minimum proportion of same-color neighbors is still desirable but above that, the larger the proportion, the less satisfied the agents are and the more likely they are to move. Despite diversity providing the highest utility, simulations of these models predict a surprisingly high level of segregation.
When considering questions of public policy, both economists and sociologists have emphasized the importance of these modeling results. For example, [] conclude that 'segregation can emerge. . . even among multiculturalists who actively seek diversity, so long as they are also sensitive to small changes in ethnic composition. ' Moreover, [] suggest that the 'welfare effect of educating people to have preferences for integration might be adverse' because the 'segregated outcome will be unsatisfying for the majority of people. ' The practical implications are clear: if the theoretical predictions are true, public policies that promote openness and tolerance are futile because they cannot improve integration.
Despite their importance for policy, the predictions of the segregation models have not been extensively tested empirically before. Most of the previous work has investigated what kind of preferences for neighborhood composition individuals from different racial and ethnic groups hold and what level of segregation these preferences produce if plugged into the models [, -]. The question whether particular preferences indeed cause the predicted outcomes has received less attention. Answering this question with observational data is extremely difficult since there are multiple correlated factors that influence a person's moving decision []. For example, in a residential setting, the decision to move house is affected not only by the ethnic composition of the current and future neighborhood, but also by the household's income, the price of real estate, proximity to the workplace, etc. []. In a conference room or a lecture hall, the decision to pick a particular seat may be influenced by the characteristics of the seating neighbors but may also have to do with when one enters the room or one's personality type [].
The best method to isolate the effect of a behavioral factor is to conduct a controlled experiment []. So far there has been only one small-scale experimental test of a onedimensional version of the Schelling model []. We therefore designed, conducted, and analyzed an experiment to test the predictions of the two-dimensional segregation model for four different utility functions, representing different preferences for local similarity, diversity, and difference. For simplicity, we refer to the corresponding games as 'Same, ' 'Diverse, ' 'Same and Diverse, ' and 'Same or Different' (Figure ). The threshold utility function in the Same game, originally posited by Schelling [, ] models an intrinsic preference to be among similar others []. The utility function in the Diverse game represents a scenario where individuals strive for maximum diversity. The utility function in the Same and Diverse game represents a preference for similar others up to a certain level combined with a preference for some diversity. This utility function has practical implications, as it is believed that while some preference for similar others cannot be eradicated, preference for diversity can be taught in schools and incentivized through appropriate policies []. Finally, the Same or Different game models a world where individuals prefer either extreme of full or no diversity. This last utility function is not particularly meaningful in terms of policy applications, but nevertheless offers insights on the importance of individual incentives for solving the problem of segregation.

Materials and methods
The experiments were conducted as part of an interactive demonstration on mathematical modeling that we presented to high school students. An important novelty of our experimental design is that it was performed in a naturally occurring social environment. Most game experiments reported in the literature place anonymous subjects in isolated booths in computer laboratories, do not allow communication, and incentivize performance with monetary payments. In contrast, we approached an age group that is technology-savvy, almost universally familiar with computer games, and particularly susceptible to social influence. We then used fast and competitive game dynamics and social comparison to Figure 1 Utility functions, representative simulation outcomes, and representative experiment outcomes for the four games. The four columns correspond to the Same game, the Diverse game, the Same and Diverse game, and the Same or Different game. The first row shows the utility functions used in the simulations (and equivalently, the scoring rules in the experiment). The second row shows a typical outcome in the simulation for a group size of 20. The third row shows the outcomes in one of the experimental groups with 20 participants. In the simulation, at every time period, one of the 20 agents is selected randomly and given the opportunity to relocate. The agent evaluates the nearest four available locations in the up, down, left, and right directions, as well as its current location. If the agent identifies a single strictly best available location among these, it moves to (or stays at) it; if it identifies multiple best locations, it randomly chooses to relocate to one of them. motivate them. Our innovative experimental setup helps reduce research costs; it also enables studying behaviors that are dominated by social incentives.
We conducted the experiments with  high school classes in Sweden. The classes had between  and  students, providing a total of  participants, aged between  and . To recruit the classes, we identified mathematics teachers in three different regions of Sweden (east coast, middle, and west coast) and contacted them with the offer to lead an hour-long interactive demonstration on mathematical modeling of social processes during one of their class sessions. The experimental sessions were conducted at the beginning of the demonstrations and introduced as a game without any reference to segregation. Each participant was given a surf tablet to use as a game controller and assigned a number and a color for their avatar [ Figure  The games were designed to be as simple, intuitive, and engaging for participants as possible. They resembled the standard two-dimensional Schelling game [, ] with the exception that play was in real time, movement was restricted, and global communication was allowed. The game field was a six-by-six square grid. Participants observed and played the games in real time. They could move their avatars up, down, left, or right to the next available empty spot, if such a spot existed. This meant that a participant could be potentially unable to move if all empty spots were diagonally situated. During the experiment, participants could communicate with each other. Communication was allowed in order to increase the participants' engagement and motivation in the game. The data show that the ability to communicate did not significantly de-anonymize participants' interactions [ Figure S in Additional file ]. We expected that communication would help resolve the coordination problems in the Same and Diverse, the Diverse, and the Same or Different game. However, as we discuss later, the results we observe are not dictated by global coordination, as very few groups managed to achieve global coordination.
Participants played four games with scoring rules equivalent to the four utility functions (Figure (a)-(d)), in randomized order. Participants could observe their current score in real time throughout the game, but only the score at the end of the game counted. We did not offer monetary incentives but the games were very dynamic (. moves per second on average) and caused a lot of excitement and exclamations during game play. We provided a social incentive to perform well by informing participants in advance that their total score from the four games would be projected on the screen at the end of the experiment. Each game was stopped when all participants indicated that they did not want to move any further, or after  minutes of play.

Results
In the Same game, participants produced slightly higher levels of segregation and average scores than the model predictions ( Figures (a), (a)). In  out of the  trials the participants separated in two distinct groups at either side or corner of the grid (as in Figure (i)). The Same game was the easiest game for the participants to complete: nearly all groups converged to the end-game segregation and average scores in less than  seconds of play [ Figure S in Additional file ]. This was because it quickly became apparent which side 'belonged' to which color.
For the Diverse game, the segregation was low and similar to the model predictions. The participants continued to move for the entire two minutes of the experiment [ Figure S in Additional file ] but they eventually achieved the expected high levels of integration ( Figure (b)) and high average scores ( Figure (b)).
The experimental results for the Same and Diverse game did not reveal marked levels of segregation, contradicting model predictions. In most cases, participants actually achieved a good degree of integration ( Figure (c)). Yet, their scores were not always lower than the simulated agents' scores ( Figure (c)). The high level of segregation in the simulation was due to the fact that a small number of agents quickly formed 'a frontline' with an optimal mix of own-and other-color neighbors. The agents not on this front-line then moved to their own-color side of the borderline, which provided higher utility than the other-color side ( Figure (g)). In contrast, experiment participants achieved a more uniform mix (Figure (k)), similar to the pattern in the Diverse game ( Figure (j)).
The experimental results for the Same or Different game also deviated from the model. The students tended to have higher levels of segregation ( Figure (d)). Further, based on the average scores, it is clear that the participants did worse in the game than the simulation agents (Figure (d)). A priori, we expected that the participants would recognize In the simulation, agents are situated on a two-dimensional 6 × 6 grid. At every time period, an agent is selected uniformly at random from the population and given the opportunity to relocate. The agent evaluates the nearest four available locations in the up, down, left and right directions, as well as its current location. If the agent identifies a single strictly best available location among these, it moves to (or stays at) it; if it identifies multiple best locations, it randomly chooses to relocate to one of them. Simulations were run for 100,000 time periods and replicated 1000 times. that the 'same' strategy was much easier to coordinate on and hence, they would converge to high levels of segregation, as they easily did in the Same game. During the experiments, students made repeated references to such a strategy, shouting things like 'all the yellows up to the top. ' Four groups attempted to carry out this approach but it took only one or two contrarians to upset the pattern [ Figure SD in Additional file ]. Despite decreasing the scores of their neighbors, these contrarians acted rationally, since they had all non-same neighbors and thus maximum utility. As a result, no group managed to achieve the simpler one-color-neighborhood solution. The students also made suggestions about checkerboard solutions like 'we should alternate. . . blue then yellow' or 'we'll stand in groups of four two of each with one square distance between. ' Three groups managed to coordinate (or nearly coordinate) to create these more difficult checkerboard solutions. The rest of the groups failed to get close to a mutually beneficial configuration within the allotted time.
The vast majority of moves made by the participants were consistent with an attempt to maximize the utility functions provided to them. Figure  shows the average latency until a move is made as a function of neighbor types for the four games. Movement patterns differed greatly between games, but were consistent across trials within a game. The timing of the moves reflected a strong tendency towards higher scoring configurations. For example, in the Same game individuals with zero same neighbors typically moved after less than  seconds, while those with  or more same neighbors would remain stationary for more than  seconds. The motivation to get high scores was also reflected in the students' Figure 3 The average score achieved at the end of each game in the experiment compared to the predictions from the simulations. The solid circles show the average score achieved by the groups in the experiment and the gray areas show the proportion of the simulation replications that predicted that particular average score for each group size. In the simulation, agents are situated on a two-dimensional 6 × 6 grid. At every time period, an agent is selected uniformly at random from the population and given the opportunity to relocate. The agent evaluates the nearest four available locations in the up, down, left and right directions, as well as its current location. If the agent identifies a single strictly best available location among these, it moves to (or stays at) it; if it identifies multiple best locations, it randomly chooses to relocate to one of them. Simulations were run for 100,000 time periods and replicated 1000 times. discussions and exclamations during gameplay. These were primarily about maximizing points, and at no time in any of the trials did any of the students make any wider reference to segregation. Both the actions (in terms of movements made in the game) and verbal expressions were thus consistent with our assumption that the participants saw the game in terms of utility maximization.
Although the participants followed the incentives we assigned them, they produced different outcomes than seen in models. Previous research suggests that this could be due to significantly high levels of behavior noise []. Our participants inadvertently committed errors but the errors were not the driving mechanism. Instead, it appears that the participants used a strategy that differed from the one implemented in the simulation models. In line with previous work [-], our model assumes the best-response strategy, according to which individuals change their position only if it increases their utility. This assumption implies that individuals are able not only to identify better positions but also to recognize when no better positions exist. However, the participants in our experiment differed in two important ways from the simulation. Firstly, they were usually unwilling to 'satisfice': the participants moved whenever they did not obtain the perfect score. This is evident from the high mobility in less-than-optimal positions in Figure (b)-(d). Secondly, when they moved, the participants did not necessarily move to better positions but rather, appeared to choose their new location randomly [ Figure S in Additional file ]. Given the fast game dynamics, this choice of random directions is probably a consequence of the cognitive limitations in identifying the optimal locations. Together these two behavioral rules led to unpredictability and no stable equilibrium arrangement was achieved. The participants' tendency to make a random move whenever they found themselves in a sub-optimal position explains the mismatch with the predictions of the best-response simulation model. In the Same game, the participants obtained more segregated outcomes because they wanted to avoid being on the frontier between the two neighborhoods, as this made them more vulnerable to other participants' moves. In the Same and Diverse game, the participants avoided segregation because they could not be satisfied with being at the periphery of their own-color neighborhood, as these positions entailed lower scores. In the Same or Different game, the participants failed to coordinate on the common-sense solution based on two groupings of yellow and blue because their scores could be easily lowered by one individual of the opposite color infiltrating a mono-color block.
To further test our explanation for the experimental results, we replicated the simulation without the best-response assumption. Instead, we assumed random relocation and no satisficing. In the new model, agents decide to move whenever their utility is less than the maximum. They then move to one of the nearest four available locations in the up, down, left and right directions chosen at random. The predictions from this model match the observed outcomes better, particularly for the Same and Diverse game and the Same or Different game ( Figure ).

Discussion
Segregation can occur in a whole range of social situations with very different time and spatial scales. Our experiments best capture small-scale rapidly evolving social situations, such as deciding to whom to talk at a social mingle, choosing a work desk in a 'no fixed desk' office or classroom environment, and participating in online content communities. In these scenarios, movement between different positions can occur on a time scale of seconds, minutes, or hours. The two behavioral rules our experiment supports are: () individuals aim to obtain the optimal situation rather than satisficing and () when at a suboptimal position, individuals make 'random' positional changes in an attempt to increase their utility. At a social mingle, individuals implementing these rules would frequently move, continually changing group composition. In such social situations, in the western world at least, it is plausible that many peoples' utility function is similar to that for our Same and Diverse game: individuals value some level of diversity but do not want to be in a clear minority. Our experimental results and our revised model based on the observed behavioral rules suggest that even when the participants of a mingle have some preference for socializing with similar individuals, we would still expect a high level of diversity and little segregation. Similarly, under the same assumptions, we would not expect high-levels of segregation within flexible workplace practices or free-seated classrooms.
Residential and work segregation takes place on longer time scales, over wider areas, and with much larger costs and benefits than in the experiments we have carried out. We should therefore be careful how we interpret our experimental results in a wider context. Nevertheless, it is important to note that experiments on humans, even if they do not perfectly map onto specific life situations, are likely to be more relevant to policy making than simulation results alone. The strong conclusions of [] and [], quoted in the introduction, that suggest that weak preferences for similar neighbors can produce very strong segregation, are not supported by our experiment. In the Same and Diverse experi-ment, segregation is low and those individuals scoring poorly had inadvertently moved in to areas with slightly too high concentrations of dissimilar neighbors. On this basis, these models should not be used as an argument against the need to educate people in the benefits of diversity and we should not conclude that real-world segregation is an unavoidable consequence of weak preferences.
To find out more about the behavioral rules adopted by people moving to a new house, choosing a new school for their child, or changing their job, a greater understanding is needed of what information is gathered by individuals. Relocation decisions that are costly to make may involve more intensive information search, but also involve some degree of satisficing. Relocation decisions that are easy to implement are likely to be more exploratory and more common. Humans would often stick to simple heuristic rules or succumb to biases, even though basic forward-looking reasoning and communication may allow them to find better solutions []. We saw this most clearly in the Same or Different game, where we a-priori expected groups to coordinate on the straightforward highsegregation solution. This never occurred, despite the fact that participants could freely communicate and in fact, even discussed the potential solution. This observation reminds us that simplified models and empirical testing, or, in other words, social science, are still more valuable for explaining behavior and making predictions than common sense or intuition alone.
The differences between our experiment and the previous modeling work on segregation show how aspects of human psychology that are not captured in simplified simulation models may affect outcomes at the group or society level. The participants in our experiment behaved rationally in the sense that they acted to maximize their utility. Yet, the patterns they created were characteristic to humans rather than simulated agents. Because our experimental games allowed for asynchronous and repeated interactions in a social setting, they were sufficiently flexible to reveal these patterns. The results show that even in a simple setting, humans can act according to the incentives we give them while simultaneously defying our models of what they will do.

Additional material
Additional file 1: Supporting Appendix. Additional information, additional analyses, and experiment protocol (pdf )