Win-stay lose-shift strategy in formation changes in football

Managerial decision making is likely to be a dominant determinant of performance of teams in team sports. Here we use Japanese and German football data to investigate correlates between temporal patterns of formation changes across matches and match results. We found that individual teams and managers both showed win-stay lose-shift behavior, a type of reinforcement learning. In other words, they tended to stick to the current formation after a win and switch to a different formation after a loss. In addition, formation changes did not statistically improve the results of succeeding matches.The results indicate that a swift implementation of a new formation in the win-stay lose-shift manner may not be a successful managerial rule of thumb.


Introduction
Exploring rules governing decision making has been fascinating various fields of research, and its domain of implication ranges from our daily lives to corporate and governmental scenes. In economic contexts in a widest sense, individuals often modify their behavior based on their past experiences, attempting to enhance the benefit received in the future.
Such decision making strategies are generally called reinforcement learning. In reinforcement learning, behavior that has led to a large reward will be selected with a larger frequency, or the behavior will be incrementally modified toward the rewarded one. Reinforcement learning is common in humans [1,2] and non-humans [3], is implemented with various algorithms [4], has theoretical underpinnings [1,4], and has neural substrates [5,6].
A simple version of reinforcement learning is the so-called win-stay lose-shift (WSLS) strategy [7,8]. An agent adopting this strategy sticks to the current behavior if the agent is satisfied. The agent changes its behavior if unsatisfied. Experimental studies employing human participants have provided a line of evidence in favor of WSLS in situations such as repeated Prisoner's Dilemma [9,10], gambling tasks [11,12], and tasks in which participants construct virtual stone tools [13][14][15]. It has also been suggested in nonscientific contexts that decisions by athletes and gamblers are often consistent with WSLS patterns even if the outcome of games seems to be independent of the decision [16].
Association football (also known as soccer; hereafter refer to it as football) is one of the most popular sports in the world and provides huge business opportunities. The television rights of the English Premier League yield over two billion euros per year [17]. Transfer fees of top players can be tens of millions of euros [18]. Various aspects of football, not only watching but also betting [19] and the history of tactics [20], enjoy popularity. Football and other team sports also provide data for leadership studies because a large amount of sports data is available and the performance of teams and players can be unambiguously measured by match results [21][22][23].
In the present study, using data obtained from football matches, we examine the possibility that managers of teams use the WSLS strategy. Managers can affect the performance of teams through selections of players, training of players, and implementation of tactics including formations [18]. In particular, a formation is a part of tactics to determine how players participate in offense and defense [24] and considered to affect match results [24,25]. Managerial decision making in substituting players during a match may affect the probability of winning [25]. We hypothesize that a manager continues to use the same formation if he has won the previous match, whereas he experiments on another formation following a loss in the previous match.
The WSLS and more general reinforcement learning posit that unsuccessful individuals modify their behavior to increase the probability of winning. Therefore, we are interested in whether a formation change improves the performance of a team. To clarify this point, we also investigate effects of formation changes on the results of succeeding matches.

Data set
We collected data on football matches from two websites, J League Data Site [26] on J League, and Kicker-online [27] on Bundesliga. J League and Bundesliga are the most prestigious professional football leagues in Japan and Germany, respectively. We refer to the two data sets as the J-League and the Bundesliga data sets. The two data sets contain, for each team and match, the season, date, manager's name, result (i.e., win, draw, or loss), and starting formation. The J-League data set contains 5,944 matches played in 33 seasons from 1993 to 2014. The Bundesliga data set contains 15,548 matches played in 51 seasons from 1963 to 2014.
Between 1993 and 2004, except for 1996, each season of J League was divided into two half seasons. After the two half seasons had been completed, two champion teams, each representing a half season, played play-off matches. We regarded each half season as a 4 season because intervals between two half seasons ranged from ten days to two months and therefore are longer than one week, which was a typical interval between two matches within a season. We also carried out the same analysis when we regarded one year, not one half season, as a season and verified that the main results were unaltered (Appendix A).
We also collected data on Bundesliga from another website, Fussballdaten [28]. We focused on the Kicker-online data rather than the Fullballdaten data because the definition of the position was coarser for the Fussballdaten data (i.e., a player was not assumed to change his position during a season) than the Kicker-online data. Nevertheless, to verify the robustness of the following results, we also analyzed the Fussballdaten data (Appendix B).

Definition of formation
The definition of formation was different between the two data sets. In the J-League data, each of the ten field players was assigned to either defender (DF), midfielder (MF), or forward (FW) in each match. We defined formation as a triplet of the numbers of DF, MF, and FW players, which sum up to ten. For example, a formation 4-4-2 implies four DFs, four MFs, and two FWs.
In the Bundesliga data, the starting positions of the players were given on a twodimensional map of the pitch (Fig. 1). For this data set, we defined formation as follows.
First, we acquired the distance of players from the goal line by referring to the HTML source code of Kicker online. Second, we grouped players whose distances from the goal line were the same. Third, we ordered the groups of players in terms of the distance, resulting in an ordered set of the numbers of players at each distance value. The set of numbers defined a formation. For example, when the distances of the ten field players are equal to 113,113,113,113,236,236,359,359,359, and 441, the formation of the team is defined to be 4-2-3-1 (Fig. 1). The minimal nonzero distance between the two players was equal to 31. Therefore, we did not have to worry about the existence of players possessing almost the same distance values while being classified into distinct positions.

Burstiness and memory coefficient of interevent time series
To capture temporal properties of formation changes, we calculated burstiness, B, and memory coefficient, M [29], on the basis of the interevent time series {τ i } defined as follows. For a given team, we denoted by t 0 , t 1 , . . ., t N (2 ≤ t 0 < t 1 < · · · < t N ) the times when the team changes the formation. The number of formation change summed over all seasons is equal to N + 1. We counted the time in terms of the number of match rather than the day. If t 2 = 5, for example, the team changed the formation to play the fifth match, and it was the third change for the team since the first match in the data set. We counted the time in this way to exclude effects of variable intervals between consecutive matches in terms of the real time. The interevent time for formation changes was defined by τ i = t i − t i−1 (1 ≤ i ≤ N). As we will explain in the following sections, we have The burstiness is defined by where m = N i=1 τ i /n and σ = N i=1 (τ i − m) 2 /n are the mean and standard deviation of interevent time, respectively. B ranges between −1 and 1. A large value of B indicates that a sequence of formation change events is bursty in the sense that it obeys a long-tailed distribution of interevent times. The Poisson process has the exponential distribution of the interevent time and hence yields B = 0. where . An uncorrelated sequence of interevent times yields M = 0.
To examine the statistical significance of B values obtained from the collected data, we generated 10 3 sequences of interevent times from the exponential distribution whose mean was equal to that of the original data. Each synthesized sequence had the same length (i.e., N) as that of the original data. We calculated B for each synthesized sequence, which resulted in a distribution of B for the Poisson process. We regarded that the value of B for the original data was significant if it was not included in the 95% confidential interval (CI) on the basis of the distribution generated by the Poisson process. We calculated the CI for M in the same manner except that we generated synthesized sequences by randomizing the original sequence instead of sampling sequences from the exponential distribution.

GLMM
We examined effects of previous matches and other factors on the likelihood of formation changes in each team. To this end, we used a generalized linear mixed model (GLMM) with binomial errors and a logit-link function. The dependent variable was the occurrence or lack thereof of formation changes, which was binary. As independent variables, we included the binary variable representing whether or not the stadium was the home of the team (i.e., home or away) and the ternary result of the previous match (i.e., win, draw, or loss). We designated the draw as the reference category for the match result. Because the likelihood of formation changes may be affected by a streak of wins or losses, we also included the result of the second last match as an independent variable. The difference between the focal team's strength and the opponent's strength was also an independent variable. The strength of a team was defined by the probability of winning in the season.
We estimated the strength of a team separately for each season because it can vary across seasons. The name of the manager was included as a random effect (random intercept).
In this and the following analysis, we excluded the first match in each season for each team because we considered that the result of the last match in the preceding season would not directly affect the first match of a new season. We further excluded the second match in each season for each team from the GLMM analysis when we employed the result of the second last match as an independent variable. Because the J-League data set did not have the information on managers between 1993 and 1998, we only used data between 1999 and 2014 in the GLMM analysis. We performed the statistical analysis using R 3.1.2 [30] with lme4 package [31].

Measurement of win-stay lose-shift for individual managers
Different managers may change the formation in different manners. Therefore, we quantified the tendency of WSLS for each manager. We calculated P (no change|win) − P (no change|draw), where P (X|Y ) is the conditional probability of event X ∈ {no change, change}, i.e., whether or not the manager changes the formation, given event Y ∈ {win, draw, loss}, i.e., the result of the last match. A positive value of P (no change|win)−P (no change|draw) indicates the tendency to win-stay. Likewise, the tendency to lose-shift was measured by

Ordered probit model
We also investigated the effects of formation changes on match results. We used the ordered probit model because a match result was ternary. Because the strength was considered to heavily depend on teams, we controlled for the strength of teams. The 8 same model was used for fitting match results in football in the Netherlands [32] and the UK [18].
The dependent variable of the model was a match result. We assumed that the occurrence of formation change (change or no change), the stadium (home or away), the strength of teams, and the result of the previous match (win, draw, or loss) can affect a match result. As a linear combination of these factors, we defined the unobserved potential variable for team i, denoted by α i , by where f i = 1 if team i changed the formation, and f i = 0 otherwise; h i = 1 if the stadium was the home of team i, and h i = 0 otherwise; w i = 1 if team i won the previous match, and w i = 0 otherwise; ℓ i = 1 if team i lost the previous match, and ℓ i = 0 otherwise; the strength of team i denoted by r i was defined as the fraction of matches that team i won in the given season. In Appendix D, we conducted the analysis by assuming that r i was a latent variable obeying the normal distribution and then using the hierarchical Bayesian model [33].
Consider a match between home team i and away team j. We assumed that the match result, denoted by k ij , was determined by the difference between the potential values of the two teams, i.e., Variables y ij and k ij are related by where c 0 and c 1 are threshold parameters, and ǫ ij is an error term that obeys the normal distribution with mean 0 and standard deviation 1. Because h i − h j = 1, β h appears as a constant term on the right-hand side of Eq. (4). In fact, it is impossible to estimate β h because β h effectively shifts c 0 and c 1 by the same amount such that there are only two degrees of freedom in the parameter space spanned by c 0 , c 1 , and β h . Therefore, we assumed c 0 = −c 1 and estimated c 0 and β h . This assumption did not alter the estimates of the other parameters. Equation (5) results in where P denotes the probability, and Φ(·) is the cumulative standard normal distribution function.
We excluded the matches that were the first game in a season at least for either team.

Influence of individual manager's behavior on match results
Different managers may show WSLS behavior to different extents to respectively affect match results. Therefore, we analyzed data separately for individual managers. For each manager i, we calculated the probability of winning under each of the following four conditions: (i) i's team won the previous match, and i changed the formation, (ii) i's team won the previous match, and i did not change the formation, (iii) i's team lost the previous match, and i changed the formation, and (iv) i's team lost the previous match, and i did not change the formation. We then compared the probability of winning between cases (i) and (ii), and between cases (iii) and (iv) using the paired t-test. In the t-test, we included the managers who directed at least ten matches in both of the two cases in comparison.
To further examine possible relationships between manager's behavior and match results, we defined the tendency of the WSLS behavior for each manager (degree of WSLS 10 for short) by where P WSLS (change|win) (= 0) is the conditional probability that a perfect WSLS manager changes the formation after winning, and likewise for P WSLS (change|loss) (= 1). The degree of WSLS ranges from 0 to 2.

Burstiness and memory
The

Win-stay lose-shift behavior in formation changes
We examined the extent to which managers possibly changed the formation of the team after losing a match and persist to the current formation after a win. The results of the GLMM analysis with the results of the previous matches being the only independent variables are shown in Table 1. For both data sets, winning in a match significantly decreased the probability of formation change in the next match, and losing in a match increased the probability of formation change. The results did not essentially change when we used the full set of independent variables ( Table 2). Formation changes are consistent with WSLS patterns.
For the J-League data, the effects of all the additional independent variables were insignificant. We analyzed the J-League data by regarding a pair of half seasons (i.e., an yearly season) as a season to confirm that the results remained qualitatively the same except that winning in the second last match also significantly decreased the probability of formation change (Appendix A). We also confirmed that matches played in the further past affect the probability of formation change to progressively small extents (Appendix C).
For the Bundesliga data, winning in the second last match also significantly decreased the probability of formation change in the extended GLMM model (Table 2). In addition, a streak of losses significantly increased the probability of formation change. These results are consistent with WSLS behavior. We also found for the Bundesliga data that stronger teams less frequently changed the formation and that a team would not change the formation to fight home games. We also investigated the Fussballdaten data for Bundesliga, in which the definition of formation was different, and confirmed that managers tended to use the WSLS strategy (Appendix B).
The data were aggregated across the managers in the GLMM analysis. Therefore, we also analyzed the data separately for individual managers. The tendency of the WSLS behavior for individual managers is shown in Figure 2. A circle in the figure represents a manager. For both data sets, a majority of managers use the WSLS strategy, consistent with the results obtained from the GLMM analysis.

Determinants of match results
The results obtained from the ordered probit model are shown in Table 3. For both data sets, formation changes did not significantly affect a match result. The result remained qualitatively the same when each pair of half seasons was considered as a season in the J-League data (Appendix A), when the Fussballdaten data was used (Appendix B), and when the strength of team was assumed to be a latent variable in the ordered probit model (Appendix D). Table 3 also tells us the following. Trivially, stronger teams were more likely to win in both data sets. The home advantage was significant in both data sets, consistent with previous literature [18,35]. Both data sets also showed negative persistence effects, i.e., the results of the current and previous matches tended to be opposite, consistent with previous literature [18]. However, specificity of the negative persistence effect depended on the data set. In J League, a loss tended to yield a better result in the next match. In Bundesliga, a win tended to yield a poor result in the next match. Figure 3(a) shows the probability of winning after individual managers changed or did not change the formation after a win in the J-League data. A large circle in Figure 3 represents a manager who presented both types of actions (i.e., formation change after winning and no formation change after winning) at least ten times. A small circle represents a manager who presented either type of action less than ten times. The formation change does not appear to affect the probability of winning. This is also apparently the case for the actions after a loss (Figure 3

(b)) and the Bundesliga data (Figures 3(c) and
3(d)). To be quantitative, we conducted the paired t-test on the managers who submitted the two types of actions at least ten times in each case (managers shown by the large circles in Figure 3). For the J-League data, there was no significant effect of formation change on the probability of winning both after winning (p = 0.8514, n = 4; corresponding to Figure 3(a)) and losing (p = 0.2878, n = 7; Figure 3(b)). For the Bundesliga data, formation changes after winning significantly decreased the probability of winning in the next match (p = 0.0043, n = 93; Figure 3(c)), whereas there was no significant effect after losing (p = 0.1862, n = 93; Figure 3(d)). These results suggest that formation changes did not at least increase the possibility of winning.
The analysis with the ordered probit model aggregated the data from all managers.
Therefore, we examined the relationship between the degree of WSLS and the probability of winning for individual managers. The results are shown in Figure 4. A circle in Figure   4 represents a manager. We did not find a significant relationship between the usage of the WSLS and the probability of winning for both J-League (Pearson's r = 0.1304, p = 0.5169, n = 26) and Bundesliga (r = −0.1058, p = 0.29, n = 101) data.

Discussion
We have provided evidence that football managers tend to stick to the current formation until the team loses, consistent with the WSLS strategy previously shown in laboratory experiments with social dilemma games [9,10] and gambling tasks [11,12]. Formation changes did not significantly affect a match result in most cases. This result seems to be odd because managers change formation to lead the team to a success. Generally speaking, when the environment in which an agent is located is fixed or exogenously changing, reinforcement learning usually improves the performance of the agent [4]. However, computational studies have suggested that it is not always the case when agents employing reinforcement learning are competing with each other, because the competing agents try to supersede each other [36][37][38][39]. The present finding that manager's WSLS behavior does not improve team's performance is consistent with these computational results. Empirical studies also suggest that humans obeying reinforcement learning does not improve the performance in complex environments. For example, players in the National Basketball Association were more likely to attempt 3pt shots after successful 3pt shots although their probability of success decreased for additional shots [40]. Also in nonscientific accounts, it has been suggested that humans engaged in sports and gambles often use the WSLS strategy even if outcome of games is determined merely at random [16]. We have provided quantitative evidence underlying these statements.
Many sports fans possess the hot hand belief in match results, i.e., belief that a win or good performance persist [41]. However, empirical evidence supports that streaks of wins and those of losses are less likely to occur than under the independence assumption [41].
By analyzing patterns of matches in the top division of football in England, Dobson and Goddard (2001) suggested the existence of negative persistence effects, i.e., a team with consecutive wins tended to perform poorly in the next match and vice versa [18]. Their results are consistent with the present results; we observed the negative persistence effects, i.e., anticorrelation between the results of the previous and present matches.
An important limitation of the present study is that we have oversimplified the concept of formation. Effective formations dynamically change during a match owing to movements of players. Because of the availability of data and our interests in the manager's long-term behavior rather than formation changes during a match [25], we used the formation data released in the beginning of the matches. Based on recent technological developments, formations can be extracted from tracking data on movement patterns of players [42,43]. Further investigations on manager's decision making using such technologies warrant future research.
Appendix A: Analysis of the J-League data on the basis of yearly seasons

Appendix B: Fussballdaten
We analyzed data on Bundesliga from another website, Fussballdaten [28]. In the Fussballdaten data, each field player was assigned to one of the three positions (i.e., DF, MF, or FW) registered for an entire season. We defined the formation by counting the number of each type of field player in the same manner as that for the J-League data set.
First, to examine possible existence of WSLS behavior by managers, we applied the GLMM analysis to the Fussballdaten data. The results shown in Tables 7 and 8 are largely consistent with those for the Kicker online data (Tables 1 and 2). In particular, winning and losing in the previous match significantly decreased and increased the probability of formation change in the next match, respectively, consistent with WSLS behavior.
Second, we also investigated the effect of formation change and other factors on the match result using the ordered probit model. Table 9 indicates that formation changes have not affected match results. This result is consistent with those for the two data sets shown in the main text. In addition, winning in the previous match decreased the probability of winning in the next match, indicating the presence of the negative persistence effect. This result is consistent with that for the Kicker online data (Table 3).

Appendix C: Cross-correlation analysis
To further investigate possible relationships between formation changes and match results, we measured the cross-correlation between the two. In this analysis, we did not exclude the first match in each season. We used the teams that played at least 100 matches. We , N team is the number of teams, andτ is the lag. We measured the cross-correlation between formation changes and wins by ρ(w, f, −τ ) ifτ < 0.
Replacing w by ℓ in Eqs. (11) and (12) defines the cross-correlation between formation changes and losses.
To examine the statistical significance of the cross-correlation obtained from the original data, we generated 10 3 randomized sequences of formation changes as follows. For given team i and positive lagτ , we randomly shuffled the original sequence of formation changes, {f i,2+τ , . . . , f i,T i }, by assigning 1 (i.e., formation change) to each match with the equal probability such that the number of 1s in the synthesized sequence was the equal to that in the original sequence. We generated a randomized sequence for each team.
Then, we measured the cross-correlation between the randomized sequences of formation changes and {w i,2 , . . . , w i,T i −τ } or {ℓ i,2 , . . . , ℓ i,T i −τ } using Eq. (11). We repeated this procedure 10 3 times to obtain 10 3 cross-correlation values. The cross-correlation for the original data was considered to be significant for a givenτ if it was not included within the 95% CI calculated on the basis of the 10 3 correlation coefficient values for the ran- The cross-correlation measured for various lags is shown in Figure 5.

Appendix D: Hierarchical Bayesian model
In the main text, we used the fraction of matches that a team won in a season to define the strength of the team. In this section, we analyze a model in which the strength of a team is assumed to be a latent variable. We used the hierarchical Bayesian ordered probit model combined with the Markov Chain Monte Carlo (MCMC) method [33]. The model is the same as that used in the main text except for the derivation of the team strength.
We assumed that the prior of the strength of team i in a season, denoted by r i , obeyed the normal distribution with mean 0 and variance σ 2 . The priors of β f , β h , β w , and β ℓ obeyed the normal distribution with mean 0 and variance 10 2 . The prior of σ 2 obeyed the uniform distribution on [0, 10 4 ]. We conducted MCMC simulations for four independent chains starting from the same prior distributions. The total iterate per chain was set to 25,000, and the first 5,000 iterates were discarded as transient. The thinning interval was set to 20 iterates. A final coefficient was regarded to be significant if the 95% credible interval did not bracket zero. We excluded the matches that were the first game in a season at least for either team. We performed the analysis using R 3.1.2 [30] and RStan package [44].