Skip to main content

Methods for quantifying effects of social unrest using credit card transaction data

Abstract

Societal unrest and similar events are important for societies, but it is often difficult to quantify their effects on individuals, hindering a timely and effective policy-making in emergencies and in particular localized social shocks such as protests. Traditionally, effects are assessed through economic indicators or surveys with relatively low temporal and spatial resolutions. In this work, we compute two behavioral indexes, based on the use of credit card transaction data, for measuring the economic effects of a series of protests on consumer actions and personal consumption. Using data from a metropolitan area in an OECD country, we show that protests affect consumers’ shopping frequency and spending, but in noticeably different ways. The effects show strong temporal and spatial patterns, vary between neighborhoods and customers of different socio-demographical characteristics as well as between merchants of different categories, and suggest interesting subtleties in purchase behavior such as displaced or delayed shopping activities. Our method can generally serve for the real-time monitoring of the effects of major social shocks or events on urban economy and consumer sentiment, providing high-resolution and cost-effective measurement tools to complement traditional economic indicators.

1 Introduction

The routine of daily life is punctuated by extraordinary public events, ranging from major sports and cultural events, to terror attacks or disasters. Some events are local, and some are national or even wider; some affect only part of the population, while others affect the entire population. Although unlike natural events and disasters, certain social events, such as a protest, a riot, a large police action, or a bomb explosion can still have characteristics of natural emergencies, with fatalities and damage to properties. However, when such localized social shock occurs, we have mostly qualitative or at best retrospective survey data about its effect on the surrounding community. Consequently it is impossible to know how well authorities managed the side-effects of the event, and thus difficult to develop effective event policies for use by police and municipal authorities.

In particular, understanding the effects of major social shocks or events on the economy and consumer behavior can have important implications [17]. Existing measures of economic behavior and attitudes towards economic situations use surveys to assess consumer confidence or consumer sentiment, asking about the intentions to purchase goods or to make large investments. These include objective measures published by national agencies, such as the measure of personal consumption expenditures in the monthly personal income and outlays report, provided by The Bureau of Economic Analysis (BEA) [8]. Other indicators are based on consumers’ subjective reports, such as the monthly consumer confidence index (CCI) issued by The Conference Board [9], and the monthly index of consumer sentiment (ICS) published by the University of Michigan [10]. These data and the actual economic behavior, as expressed in consumption, can be strongly correlated [11], and they can therefore be used to evaluate the effects of events. However, their temporal resolution is low (consumer confidence and sentiment surveys are usually conducted on a monthly base). Their spatial resolution is low, too, since the measures aggregate information over large areas and usually entire economies. Finally, the information usually cannot be used to look at the differential effect of events on different segments of a society.

The limitations of traditional approaches may be overcome by using digital tools to study human and social behavior, analyzing large-scale quantitative data in emerging fields such as computational social science [12, 13]. The analysis of large amount of data accumulated in social networks, cellular phone records, or credit card transactions may allow us to gain a new understanding of social processes. Such data have been widely utilized to study situation awareness and response to emergencies [1418], mostly due to natural events and disasters, as well as the dynamics of communication and information propagation immediately followed [1922]. However, relatively little work, so far, aimed to understand the impact of social events, and especially localized social shocks, using quantitative behavioral data.

Einav et al. recently highlighted the new possibilities of conducting economic research in the age of “Big Data” [23]. Within this scope, in this work, we show how a particular type of data, namely credit card transaction records, can be used to quantify and understand the repercussions localized social events have on individuals’ economic behavior, quantifying when, where and how an event has an effect. In particular, credit card data help us overcome the limitations of traditional indicators by allowing us to measure purchase behavior directly, and, together with information about the types of merchants at which purchases are made, to infer what kinds of products are purchased. Especially, purchases at physical shops (rather than those made online) are clearly linked to specific spatial locations and time stamps. Furthermore, credit card issuers have information about the demographics and economic status of the credit card holder, which enables us to measure the effects of events on different demographic groups in the population. This would allow the authorities to develop targeted and timely policies in case of emergencies.

2 Data and methods

To demonstrate the use of this method, we analyzed more than ten million credit card transaction records provided by a major financial institution in an OECD country about more than 100 thousand individuals, at more than 100 thousand merchants during a period of three months. Each record consists of the date, time and amount (in local currency) for one credit card transaction, along with anonymized customer and store IDs. Additional information about the customers and stores is also available, such as customers’ gender and the neighborhoods in which they live, as well as the category of the stores and the neighborhoods in which they are located.

We focus our analysis on credit card transactions, made at stores located in the largest metropolitan area of the country. We omitted foreign and online transactions as well as transactions without timestamp, stores with fewer than ten distinct customers over the three-month period, and customers who visited fewer than ten distinct stores.Footnote 1 As we are mainly interested in people selectively changing their purchase behavior in physical locations due to social events, we further selected five broad merchant categories that mostly correspond to on-site and discretionary purchases, i.e., “amusement and entertainment”, “clothing stores”, “retail stores” (including subcategory “grocery stores, supermarkets”), “personal service providers”, and “miscellaneous stores” (including subcategory “eating places, restaurants”). The resulting data set covers 1.8 million records from 100 thousand customers at 18 thousand geo-localized stores in the metropolitan area. Figure 1 visualizes the hourly evolution of the number of transactions made by the customers, from which we can already observe strong weekly patterns that separate the purchasing behaviors on weekdays and weekends, and daily patterns that distinguish different periods of a day such as the morning, afternoon and evening.

Figure 1
figure 1

Visualization of the hourly number of transactions in the metropolitan area during the period the data set covers. The horizontal axis corresponds to the index of the days in the data set, and the vertical axis corresponds to the index of the specific hours of each day

We quantify the impact of localized social events on purchase behavior at the level of administrative neighborhoods in the metropolitan area. For each neighborhood, we look at the transactions at the stores inside the neighborhood. We then compute, on a daily basis, two behavioral indexes, using the credit card transaction data:

  1. 1.

    consumer action: the number of unique customers who made transactions at stores inside the neighborhood;

  2. 2.

    personal consumption: the median spending amount in local currency for all transactions inside the neighborhood.

The first index is a proxy of the number of customers shopping in the area. There often exist many alternatives for places to shop, so the number of people choosing a particular area is a proxy for their preference toward this area versus other areas. This therefore serves as a proxy for how comfortable people feel when visiting the area. The second index is a proxy for the individual’s consumption expenditure, which measures the consumers’ mood and the level of economic security they feel, e.g., whether they feel that they should delay certain purchases or save money due to social conditions.

For each day in the week and for each neighborhood in the metropolitan area, we construct a time series with values for the two indexes. The data cover a period of 13 weeks. We therefore obtain two time series, each with 13 data points for each day in the week, for the number of customers and the median spending amount in a neighborhood on a particular weekday or weekend. For example, Fig. 2(a) and Fig. 2(b) show the time series of the number of customers shopping, and the median spending amount, respectively, in a given neighborhood on Mondays and Saturdays. To normalize differences in shopping behaviors between different days of the week (a difference that is particularly noticeable between weekdays and weekends), we compute the relative deviation from the mean,Footnote 2 which is defined as:

$$ c = (x-\bar{X})/\bar{X}, $$
(1)

where x is the value of an index on a given day, and is the mean value of the index for the given day during the first six weeks of the data. We take the first six weeks as the reference period, since there were no particularly noteworthy events during this period.Footnote 3 We obtained two scalar values for each neighborhood for each day, which indicate the temporal evolution of the two behavioral indexes for each weekday or weekend.

Figure 2
figure 2

Time series of the two behavioral indexes in a given neighborhood on Mondays and Saturdays: (a) number of customers; (b) median spending

3 Results

We use the score of relative deviation from the mean for the two behavioral indexes, computed in each neighborhood for each day in the data set, to measure the impact of a series of protests that took place in a central site of the metropolitan area during a period of one month. In particular, we focus on the temporal, spatial, heterogeneous, and integral economic effect of these protest events.

3.1 Temporal variation of effects

In Fig. 3(a), (b), for more than 300 neighborhoods that have an average number of customers above ten in out data set, we show the median (weighted by number of stores in the neighborhoods) change in the two indexes, both temporally and spatially. More specifically, daily fluctuations of the changes in behavioral indexes are illustrated with respect to a reference period of six weeks, namely, from Day 1 to Day 42 in our data set, prior to the beginning of the societal unrest. Two major events of protests occurred on Day 62 (highlighted in green) and Day 77 (highlighted in cyan). The number of customers on these days decreased sharply in the vicinity (within 2 km) of the protest site, but less drastically in the neighborhoods within 2 to 4 km distance from the protest site and even less at larger distances. After the initial decrease, median spending returned more quickly to normal, and it was not as strongly affected as the number of customers. Here, too, the effect was generally stronger near the protest site.

Figure 3
figure 3

The impact of a series of protest events on the two behavioral indexes, in neighborhoods within 2 km (blue), from 2 to 4 km (red), and above 4 km (orange) from the protest site: (a) number of customers; (b) median spending. The dash-dot purple line indicates zero change, and the dashed green and cyan lines indicate Day 62 and Day 77, respectively

To confirm our findings, we test the statistical significance of the results shown in Fig. 3(a), (b) in the Appendix (see Table A3(a)). On the days of two major events, we see a significant drop in both behavioral indexes for all three distances in Fig. 3(a), (b). In addition, there also exist statistically significant differences between distances. On days following the major protests (those after Day 62 and Day 77), the number of customers decreased significantly in neighborhoods within 4 km from the protest site, but not beyond. On the other hand, there was no significant decrease in the median spending in any distance, suggesting that the people who went out shopping did not tend to spend less money than usual.

The combined effect of the events on the two independent behavioral indexes can be seen in 2-D histograms, such as the ones in Fig. 4(a), (b), (c). The horizontal and vertical axes represent the score of relative deviation from the mean for the two behavioral indexes, respectively, and the color code indicates the average number of stores per neighborhood for neighborhoods that have a certain combination of the two scores. In such a histogram, positions towards the upper right represent increased numbers of customers and larger median spending, while positions towards the bottom left represent the opposite. If no change occurs, all neighborhoods stay at the center of the diagram. It is clear from Fig. 4(a) that on Day 62 stores in most neighborhoods were negatively affected by the protests, with fewer customers (mostly 30–90% less) and smaller spending amounts (mostly 10–70% less) within 2 km of the protest site (event center). People went out shopping less and were reluctant to spend as much as they usually do, possibly due to the unstable situation in the area. These trends were less pronounced when moving away from the event center, as shown in Fig. 4(b), (c), which is consistent with the results shown in Fig. 3(a), (b). The same 2-D histograms for Day 77 are shown in Fig. A1 in the Appendix.

Figure 4
figure 4

The 2-D histograms of the number of neighborhoods in terms of the two behavioral indexes for Day 62 in the data set: (a) within 2 km; (b) from 2 to 4 km; (c) above 4 km. The horizontal and vertical dashed lines indicate changes of a magnitude less than 0.1

3.2 Spatial decay of effects

To quantify the spatial propagation of the event effects, we study the effect as a function of the geographical distance from the event center. In Fig. 5, the blue dots in both figures indicate the median (weighted by number of stores in the neighborhoods) magnitude of decrease in (i) number of customers (Fig. 5(a), (b)) and (ii) median spending (Fig. 5(c), (d)) for neighborhoods that are located at different distances from the event center (i.e., the dot at 2 km represents all neighborhoods that are further than 1 km but less than 2 km away from the event center).Footnote 4 Fitting the blue dots with exponential decay functions for both cases (shown as red curves), we see reasonably good fits, which indicates that the negative effect of events decayed approximately exponentially with respect to distance from the event center. We then compute the exponential decay constant, and its inverse, which is the so-called mean lifetime of the decay (mean distance in our context). The mean decay distances for Day 62 and Day 77, based on Fig. 5(a), (b), were \(1/0.20=5~\mbox{km}\) and \(1/0.35=2.86~\mbox{km}\), respectively, and, based on Fig. 5(c), (d), \(1/0.63=1.59~\mbox{km}\) and \(1/0.46=2.17~\mbox{km}\), respectively.

Figure 5
figure 5

The exponential fit to the median magnitude of decrease in the two behavioral indexes for neighborhoods that are located at different distances from the event center: (a) number of customers on Day 62; (b) number of customers on Day 77; (c) median spending on Day 62; (d) median spending on Day 77

These results indicate that, first, the amount of money people spent when making a purchase was less clearly affected by the protests, compared to the change in the number of customers, and the negative effect spread less further away from the event center as well. Second, it seems that the event on Day 77 had a stronger impact on consumer actions within the close vicinity of the event center, which is indicated by a higher initial quantity of 1.25 in the exponential fit and reflected by the magnitude of the first two points in Fig. 5(a), (b). Such effect, however, decayed spatially faster compared to Day 62, which is suggested by a smaller mean distance of 2.86 km and reflected by the generally smaller magnitude from the third point (3 km) onwards in Fig. 5(a), (b). On the other hand, on Day 62 personal consumption was more affected close to the event center, while the negative effect on this index spread further away on Day 77. Such differences might be due to people’s responses to events changing over time. Finally, we can also study the event effect as a function of the estimated travel time (by car) from the event center, thus taking into account geographical constraints and transportation accessibility. The results are similar (in terms the exponentially decaying patterns) and presented in Fig. A2 in the Appendix. We further show in Fig. A3 and Fig. A4 in the Appendix the event effect on the individual neighborhoods. Overall, these results demonstrate the possibility of utilizing transaction data to quantify the spatial decay of event effects.

3.3 Heterogeneous effects

Social events may impact neighborhoods of different characteristics and the purchase behavior of people from different demographic groups in different ways. It may also have a differential effect on purchases from different types of merchants. We therefore study the heterogeneous effects of the social events in the following scenarios, by focusing on neighborhoods within 4 km from the event center.

3.3.1 Socio-economic status

We first show in Fig. 6 the time series of median (weighted by number of stores in the neighborhoods) change in the two behavioral indexes, computed over neighborhoods of higher socio-economic status, compared to that for those of lower socio-economic status. The socio-economic status is a composite measure between 0 and 100 that quantifies the relative prosperity of the neighborhood based on a number of indicators such as income and education level, which is obtained from the results of a recent census provided by the National Statistical Institute of the country. The higher the index, the more prosperous the neighborhood is. We see that the number of customers decreased more sharply in neighborhoods of lower socio-economic status for days following the protests. On the two days of major events, however, such a decrease was slightly higher in wealthier neighborhoods, possibly due to the fact that a larger portion of the demonstrators was from these neighborhoods. On the other hand, median spending dropped more significantly in less wealthy neighborhood on these two days, but not afterwards. Tests for statistical significance of the differences are presented in Table A3(b) in the Appendix. Overall, our results suggest that the events had longer lasting negative effects on less wealthy neighborhoods, but mainly in terms of consumer action rather than personal consumption. Because political preference is shown to be correlated with wealth in the same country [24], we next examine the effects on neighborhoods distinguished by political conservatism.

Figure 6
figure 6

The impact of a series of protest events in neighborhoods of higher socio-economic status (blue) and lower socio-economic status (red): (a) number of customers; (b) median spending. The dash-dot purple line indicates zero change, and the dashed green and cyan lines indicate Day 62 and Day 77, respectively

3.3.2 Political preference

Figure 7 shows the same time series as in Fig. 6, computed over neighborhoods of higher political conservatism, compared to that for those of lower political conservatism. An index for political conservatism of each neighborhood is a measure between 0 and 100 that is computed based on the percentages of votes parties labeled as “liberal” or “conservative” obtained in a recent general election of the country. The higher the index, the more conservative the neighborhood tends to be. It can be seen that, after the major protests, more conservative neighborhoods saw a larger decrease in both the number of customers and the median spending, probably due to the fact that people avoided going out for shopping and preferred to save in unstable circumstances. On the days of major events, however, less conservative neighborhoods saw a larger decrease in the number of customers, for the same reason as in the previous analysis that more people from these neighborhoods participated in the protests.

Figure 7
figure 7

The impact of a series of protest events in neighborhoods of higher political conservatism (blue) and lower political conservatism (red): (a) number of customers; (b) median spending. The dash-dot purple line indicates zero change, and the dashed green and cyan lines indicate Day 62 and Day 77, respectively

We note that the neighborhoods we analyze here are centrally located, hence it is possible that both the socio-economic status and political preference of the neighborhoods mainly reflect the typical profile of the residents but not that of the visiting customers. The results presented here are therefore mainly behavioral change observed in these neighborhoods. However, according to our data, 54% (46%) transactions in the wealthier (less wealthy) neighborhoods being studied come from their residents or residents of other wealthy (less wealthy) neighborhoods in the city, while 61% (48%) transactions in the conservative (liberal) neighborhoods come from their residents or residents of other conservative (liberal) neighborhoods. Therefore we believe that shopping activities observed in these neighborhoods reflect to a certain extent the behavioral change of people of similar characteristics in terms of wealth and political preference. Tests for statistical significance of the differences between the two groups in Fig. 7 are presented in Table A3(c) in the Appendix.

3.3.3 Gender difference

There may also exist a gender difference in the effect of social events. Figure 8 shows the time series of median change computed over all neighborhoods, for male and female customers.Footnote 5 We see that, after the two major protests, both male and female customers tended to shop less, although they did not necessarily spend less money. However, demonstrations affected female customers more strongly than their male counterparts on the days of major protests, both in terms of the number of customers going out for shopping (for both days) and the median spending (for Day 77). Tests for statistical significance of the differences are presented in Table A3(d) in the Appendix. Figure A5 in the Appendix further illustrates the differences between four groups of different combinations of gender and political conservatism.

Figure 8
figure 8

The impact of a series of protest events averaged over all neighborhoods for male (blue) and female (red) customers: (a) number of customers; (b) median spending. The dash-dot purple line indicates zero change, and the dashed green and cyan lines indicate Day 62 and Day 77, respectively

3.3.4 Store category

Finally, Fig. 9 depicts the time series of median change computed over all neighborhoods, for three categories of stores, namely, grocery stores, family clothing stores, and restaurants. Unlike in Fig. 6, Fig. 7 and Fig. 8, the median is weighted by number of stores in each category in the neighborhoods in this case. We see that on days following the major protests, both behavioral indexes decreased for all three categories of stores, except for the number of people shopping at grocery stores on mid-week days and median spending at clothing stores. In addition, although there is no clear difference between the three types of stores in terms of the median spending, customers visited family clothing stores less often than restaurants and groceries. Interestingly, on the days of major events, grocery stores showed the largest decrease in terms of customer visits. Tests for statistical significance of the differences are presented in Table A3(e) in the Appendix.

Figure 9
figure 9

The impact of a series of protest events averaged over all neighborhoods for grocery stores (blue), family clothing stores (red) and restaurants (orange): (a) number of customers; (b) median spending. The dash-dot purple line indicates zero change, and the dashed green and cyan lines indicate Day 62 and Day 77, respectively

3.4 Integral economic effects

Although societal unrest has led to major changes in people’s purchase behavior, particularly on the days of major events, it is interesting to investigate whether there exists an integral economic effect due to displaced or delayed shopping activities.

3.4.1 Displaced shopping

The possibility of displaced shopping may be suggested by Fig. 4(c) where it can be seen that certain neighborhoods further away from the event center, in particular those corresponding to the four squares on the top right, actually saw increased activities on Day 62. In this case, each of the squares represents a single neighborhood (with color indicating the number of stores in the neighborhood), which we analyze in detail as follows.

Increased shopping activities in these four neighborhoods were mainly due to (i) new merchants, and (ii) new customers that did not appear in the reference period. First, the neighborhood corresponding to the red square (100% increase in customers and 60% in spending) saw many payments at a fitness club and, since Day 62 happened to be the first day of the month, these may correspond to people paying subscription fees at the club. As the reference period does not contain first days of the month, this fitness club did not appear as a merchant in the reference period, and therefore the corresponding purchase behavior was not observed before.

Second, the two neighborhoods corresponding to the two light blue squares (220%/120% increase in customers and 20%/100% in spending) saw increased activities due to their close proximity to a new theme park and shopping complex, where those purchases were made on a Saturday (Day 62) at stores that were newly opened in the shopping complex.

Finally, instead of increased activities driven by new merchants, those in the neighborhood corresponding to the last square (60% increase in customers and 100% in spending) were mainly driven by new customers. Indeed, 13 out of the 19 customers purchased in that neighborhood on Day 62 never visited the neighborhood in the reference period, and at the same time they did not visit any neighborhood they used to go on the same day of the week. Therefore, this suggests a pattern of displaced shopping where these customers were shifting their shopping locations, which might be due to the influence of the protest events.

In summary, displaced shopping may have indeed taken place, but being further away from the event center, there could be other factors that influence shopping behavior as well, such as those that came with a special day like Day 62 (being the first day of the month as well as a Saturday that came shortly after the opening of a shopping complex).

3.4.2 Delayed shopping

We see from Fig. 3 that, in terms of both number of customers and median spending, shopping activities generally increased on mid-week days after the weekends when a majority of the protests took place. For the number of customers, such increase is most obviously in Fig. 9(a) for purchases at grocery stores. Compared to clothing stores and restaurants, grocery stores provide products needed for daily life, and it is therefore possible that people waited during the weekends and got the necessary shopping done shortly after the major protests. On the other hand, increase in median spending on mid-week days were mainly present in neighborhoods within 2 km from the event center, and is most obviously in synchronization with the increased spending in neighborhoods of lower socio-economic status shown in Fig. 6(b). Given that median spending dropped significantly in these neighborhoods on weekends of major protests, this suggests that people may have decided to save in uncertain situations and regained some confidence for make-up purchases after the protest events.

Given the discernible patterns of displaced or delayed shopping activities, however, the integral economic effect of the protest events remains largely negative, as is evident in Fig. A6 where change in total sales is illustrated for neighborhoods that are of different distances from the event center.

4 Discussion

Our study demonstrates the potential of using pervasive and passively collected behavioral data to quantify the impact of localized social shocks and events on an urban population. We found that major events, such as societal unrest due to protests in our case, can alter people’s purchase behavior, both in terms of consumers’ tendency to visit shops in certain areas and in terms of their willingness to spend money. From a temporal perspective, personal consumption levels recover relatively quickly from the negative influence of the events. Spatially, on days of major events, consumption levels also seem more resilient, although both measures show clear spatial exponential decay. Event effects also differ between groups in the society, defined in terms of demographics, socio-economic status, and political conservatism, and between categories of merchants. Finally, given the results above, the existence of discernible patterns such as displaced or delayed shopping activities suggest that the effect of social events was not always distributed in the way one would imagine, e.g., it is not purely a function of geographical distance, and may help explain the rather unintuitive resilience of (or lack of change in) personal computation as a result of inelastic spending driven by needs.

Our analysis has certain limitations. The data set of credit card transaction records used in this study is based on a sample (about 10%) of all the individual customers of one financial institution and does not include people without credit cards. Therefore, sampling bias could exist and could potentially influence the results. Credit card transaction data may also represent only part of people’s daily spendings, as people may choose to pay with cash in certain scenarios. Furthermore, the data set only covers a period of three months, and the observations might be affected by seasonality. Finally, there might exist unobserved external events that bias our results, even though the strong spatiotemporal patterns observed on Days 62 and 77 are probably mainly caused by the major events on those days.

The real-time monitoring of economic behavior can have important social, economic and political implications. As a research tool, it makes it possible to assign quantitative values to the inherently elusive effects of social events. One advantage of the method we present here is the ability to analyze events with relatively high temporal and spatial resolutions. These allow a fine-grained understanding of events, beyond what is often possible with traditional measures. Also, the measures are less affected by demand characteristics or other biases, inherent in methods such as surveys. Such understanding and measures can potentially be used to develop timely and effective policies targeted at specific communities and demographic groups, and can often be critical in emergency situations.

With our method we can also evaluate statements made, for instance, by politicians or the media, such as that people are greatly upset by certain events. We can compare them to the observed changes in behavior, as expressed in the number of customers doing purchases and the average amount spent in a purchase. If people go about their activities as usual, the events probably have less impact, compared to when measures of behavior differ strongly from the usual values. At a practical level, these analyses may allow authorities to prepare for timely interventions to minimize possible negative consequences, possibly through a prediction mechanism that could estimate the recovery time of individual economic activities at an early stage of the event.

The framework we present here complements, rather than replaces, other methods such as traditional economic indicators or surveys. Combinations of it with other tools help create a comprehensive picture of the dynamic response of a population to social events.

Notes

  1. The two latter steps aim to remove inactive customers and stores to prevent them from biasing subsequent analyses. The specific thresholds of ten customers and ten stores we used here have little effect on the amount of filtered data: with alternative thresholds of five customers/stores or three, 2.0 or 2.1 million transaction records are left after the filtering process.

  2. An alternative would be to compute a z-score that also captures the variability of the static. However, computing sample standard deviation over a relatively small number of points may lead to large variation in the z-score hence biases the analysis afterwards. We therefore choose the relative deviation from the mean, which is a more stable measure and is also adopted in [20, 22].

  3. Except that we remove three holidays in these six weeks when shopping activities were clearly boosted and would introduce obvious bias in the analysis.

  4. Notice that the behavioral indexes are computed as relative changes with respect to a reference period, therefore the factor that activities may decrease further away from the city center due to less shops available has already been taken into account.

  5. In our data set, we have 34% female customers who account for 38% of the transactions, which is moderately imbalanced. However, since the behavioral indexes are computed at the neighborhood level as relative changes with respect to a reference period, and gender split is assumed to stay reasonably stable across the reference and analysis periods, such imbalance would not cause an issue in the analysis and statistical tests presented in the paper.

References

  1. Long PT, Perdue RR (1990) The economic impact of rural festivals and special events: assessing the spatial distribution of expenditures. J Travel Res 28(4):10–14

    Article  Google Scholar 

  2. Burgan B, Mules T (1992) Economic impact of sporting events. Ann Tour Res 19(4):700–710

    Article  Google Scholar 

  3. Crompton JL, Mckay SL (1994) Measuring the economic impact of festivals and events: some myths, misapplications and ethical dilemmas. Festiv Manage Event Tour 2:33–43

    Article  Google Scholar 

  4. Mondello MJ, Rishe P (2004) Comparative economic impact analyses: differences across cities, events, and demographics. Econ Dev Q 18(4):331–342

    Article  Google Scholar 

  5. Getz D (2005) Event management and event tourism, 2nd edn. Cognizant Communication Corporation, New York

    Google Scholar 

  6. Herrero LC, Sanz JA, Devesa M, Bedate A, del Barrio MJ (2006) The economic impact of cultural events: a case-study of Salamanca 2002, European capital of culture. Eur Urban Reg Stud 13(1):41–57

    Article  Google Scholar 

  7. Jong-A-Pin R (2009) On the measurement of political instability and its impact on economic growth. Eur J Polit Econ 25(1):15–29

    Article  Google Scholar 

  8. Personal income and outlays, U.S. Department of Commerce, Bureau of Economic Analysis (BEA). http://www.bea.gov/newsreleases/national/pi/pinewsrelease.htm

  9. Consumer confidence index, The Conference Board. https://www.conference-board.org/data/consumerconfidence.cfm

  10. Surveys of consumers, University of Michigan. http://www.sca.isr.umich.edu

  11. Carroll CD, Fuhrer JC, Wilcox DW (1994) Does consumer sentiment forecast household spending? If so, why? Am Econ Rev 84:1397–1408

    Google Scholar 

  12. Lazer D, Pentland A, Adamic L, Aral S, Barabási A-L, Brewer D, Christakis N, Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Alstyne MV (2009) Computational social science. Science 323(5915):721–723

    Article  Google Scholar 

  13. Pentland A (2014) Social physics: how good ideas spread. Penguin, Baltimore

    Google Scholar 

  14. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the international conference on World Wide Web, pp 851–860

    Google Scholar 

  15. Yates D, Paquette S (2011) Emergency knowledge management and social media technologies: a case study of the 2010 Haitian earthquake. Int J Inf Manag 31(1):6–13

    Article  Google Scholar 

  16. Yin J, Lampert A, Cameron M, Robinson B, Power R (2012) Using social media to enhance emergency situation awareness. IEEE Intell Syst 27(6):52–59

    Article  Google Scholar 

  17. Imran M, Castillo C, Diaz F, Vieweg S (2015) Processing social media messages in mass emergency: a survey. ACM Comput Surv 47(4):67–16738

    Article  Google Scholar 

  18. Martínez EA, Rubio MH, Martinez RM, Arias JM, Patane D, Zerbe A, Kirkpatrick R, Luengo-Oroz M (2016) Measuring economic resilience to natural disasters with big economic transaction data. In: Proceedings of the data for good exchange

    Google Scholar 

  19. Kapoor A, Eagle N, Horvitz E (2010) People, quakes, and communications: inferences from call dynamics about a seismic event and its influences on a population. In: Proceedings of AAAI symposium on artificial intelligence for development, pp 51–56

    Google Scholar 

  20. Bagrow JP, Wang D, Barabási A-L (2011) Collective response of human populations to large-scale emergencies. PLoS ONE 6(3):17680

    Article  Google Scholar 

  21. Altshuler Y, Fire M, Shmueli E, Elovici Y, Bruckstein A, Pentland A, Lazer D (2013) The social amplifier—reaction of human communities to emergencies. J Stat Phys 152(3):399–418

    Article  MathSciNet  Google Scholar 

  22. Gao L, Song C, Gao Z, Barabási A-L, Bagrow JP, Wang D (2014) Quantifying information flow during emergencies. Sci Rep 4:3997

    Article  Google Scholar 

  23. Einav L, Levin J (2014) Economics in the age of big data. Science 346(6210):715

    Article  Google Scholar 

  24. Dong X, Jahani E, Morales AJ, Bozkaya B, Lepri B, Pentland A (2016) Purchase patterns, socioeconomic status, and political inclination. In: International conference on computational social science

    Google Scholar 

  25. Hochberg Y, Tamhane AC (1987) Multiple comparison procedures. Wiley, New York

    Book  MATH  Google Scholar 

Download references

Acknowledgements

X. Dong was supported by a Swiss National Science Foundation Mobility fellowship while completing this work. The authors are grateful to the financial institution that provided the credit card transaction data for this research.

Availability of data and materials

Data sufficient to reproduce all results in this paper will be made available upon request.

Author information

Authors and Affiliations

Authors

Contributions

XD and JM conceived and designed the study. XD, JM, ES and BB conducted the analyses. XD, JM and ES wrote the manuscript. All authors reviewed the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiaowen Dong.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

13688_2018_136_MOESM1_ESM.mov

Dynamic change in the number of customers and median spending from Day 43 to Day 91. The four rings correspond to the neighborhood that contains the event center as well as the neighborhoods that are within 2 km, between 2 to 4 km away, and more than 4 km away from the center. (MOV 3.4 MB)

Appendix

Appendix

We present in Table A1 and Table A2 the following statistics about purchase activity comparison on the reference days and event days (Day 62 and Day 77), for each neighborhood within 4 km from the event center:

  1. 1.

    the minimum/maximum number of customers on the reference days for the event day;

  2. 2.

    the decrease in number of customers on Day 62 compared to the average of its reference days;

  3. 3.

    the minimum/maximum median spending amount on the reference days for the event day;

  4. 4.

    the decrease in median spending amount on Day 62 compared to the average of its reference days;

  5. 5.

    the minimum/maximum number of transactions on the reference days for the event day;

  6. 6.

    the decrease in number of transactions on Day 62 compared to the average of its reference days;

  7. 7.

    the minimum/maximum total sales amount on the reference days for the event day;

  8. 8.

    the decrease in total sales amount on Day 62 compared to the average of its reference days;

Table A1 Statistics about purchase activities on both reference days and Day 62, for each neighborhood within 4 km from the event center
Table A2 Statistics about purchase activities on both reference days and Day 77, for each neighborhood within 4 km from the event center

We test the statistical significance of differences between the different curves shown in Fig. 3(a), (b), Fig. 6(a), (b), Fig. 7(a), (b), Fig. 8(a), (b) and Fig. 9(a), (b). Our tests focus on the following different periods:

  1. 1.

    The overall period from Day 43 to Day 91;

  2. 2.

    The days before the first major protest began, namely, from Day 43 to Day 61;

  3. 3.

    The days following the two major protests, namely, from Day 62 to Day 91, but excluding Days 62 and 77;

  4. 4.

    On Day 62;

  5. 5.

    On Day 77.

We use non-parametric statistical tests that do not require assumption of a probability distribution, e.g., a normal distribution, of the data. Specifically, for (1) to (3), we use a Wilcoxon signed-rank test to test the statistical significance of difference between pairs of different time series, as well as difference between the values of time series of each group and zeros. For (4) to (5), we construct samples for each group that consist of the index of each neighborhood in the group repeated by the number of stores in that neighborhood. We then use a Wilcoxon signed-rank test to test the statistical significance of whether each group has a median different from zero. For pairwise comparison between the groups, we use a Wilcoxon rank-sum test for 2-category comparisons (socio-economic status, political conservatism, and gender), and a Kruskal–Wallis test followed up with post hoc comparisons using the Dunn–Sidak method [25] for 3-category comparisons (distance and store category). In Table A3 we report the p-value of the statistical tests, where bold font indicates that a null hypothesis of no difference can be rejected at the 5% significance level. Notice that for 3-category comparisons we report the statistical significance directly because there are no p-value for the pairwise comparison.

Table A3 The p-value of the statistical tests for the significance of differences between different curves, as well as differences between their values and zeros, for different factors that are being studied. Bold font indicates that a null hypothesis of no difference can be rejected at the 5% significance level. Notice that for 3-category comparisons we report the statistical significance directly because there are no p-value for the pairwise comparison

Figure A1 show the 2-D histograms that illustrate the combined effect of the events on the two independent behavioral indexes for Day 77. It can be seen that a vast majority of the neighborhoods within 4 km of the event center saw less customers than usual, even though some of them actually attracted more spending. On the other hand, such effect was not obvious for neighborhoods that are more than 4 km further away.

Figure A1
figure 10

The 2-D histograms of the number of neighborhoods in terms of the two behavioral indexes for Day 77 in the data set: (a) within 2 km; (b) from 2 to 4 km; (c) above 4 km. The horizontal and vertical dashed lines indicate changes of a magnitude less than 0.1

Figure A2 shows the event effect as a function of the estimated travel time (by car) from the event center, thus taking into account geographical constraint and transportation accessibility. As we can see, the exponentially decaying patterns are similar to those in Fig. 5.

Figure A2
figure 11

The exponential fit to the median magnitude of decrease in the two behavioral indexes for neighborhoods that are located at different travel time from the event center: (a) number of customers on Day 62; (b) number of customers on Day 77; (c) median spending on Day 62; (d) median spending on Day 77

In addition to the average results in Fig. 5 and Fig. A2, Fig. A3 and Fig. A4 show the event effect on each single individual neighborhoods, where both the size and color of the circles indicate the number of stores in each neighborhood. We see that neighborhoods with large number of stores generally saw reduced amount of shopping activity in terms of both the number of customers and the spending amount, while such reduction was more significant for the smaller neighborhoods that are closer to the event center. This is consistent with the exponentially decaying patterns we have seen in Fig. 5 and Fig. A2.

Figure A3
figure 12

The decrease in the two behavioral indexes for neighborhoods that are located at different distances from the event center. (a) number of customers on Day 62; (b) number of customers on Day 77; (c) median spending on Day 62; (d) median spending on Day 77. Both the size and color of the circles indicate the number of stores in each neighborhood

Figure A4
figure 13

The decrease in the two behavioral indexes for neighborhoods that are located at different travel time from the event center. (a) number of customers on Day 62; (b) number of customers on Day 77; (c) median spending on Day 62; (d) median spending on Day 77. Both the size and color of the circles indicate the number of stores in each neighborhood

Figure A5 illustrates the differences between four groups based on combinations of gender and political conservatism, namely, male customers in conservative neighborhoods (blue), female customers in conservative neighborhoods (red), male customers in liberal neighborhoods (orange), and female customers in liberal neighborhoods (magenta). We see that, on both days of major protests, liberal females reduced their shopping activities most, while conservative females tended to save money more than other groups (for Day 77). The latter was also generally observed for days following the major protests.

Figure A5
figure 14

The impact of a series of protest events for male customers in conservative neighborhoods (blue), female customers in conservative neighborhoods (red), male customers in liberal neighborhoods (orange), and female customers in liberal neighborhoods (magenta): (a) number of customers; (b) median spending. The dash-dot purple line indicates zero change, and the dashed green and cyan lines indicate Day 62 and Day 77, respectively

Figure A6 shows the effect of the events on total sales, in neighborhoods within 2 km (blue), from 2 to 4 km (red), and above 4 km (orange) from the protest site. We see that, although displaced or delayed shopping activities may have taken place, the integral economic effect of the protest events remains largely negative.

Figure A6
figure 15

The impact of a series of protest events on total sales, in neighborhoods within 2 km (blue), from 2 to 4 km (red), and above 4 km (orange) from the protest site. The dash-dot purple line indicates zero change, and the dashed green and cyan lines indicate Day 62 and Day 77, respectively

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, X., Meyer, J., Shmueli, E. et al. Methods for quantifying effects of social unrest using credit card transaction data. EPJ Data Sci. 7, 8 (2018). https://doi.org/10.1140/epjds/s13688-018-0136-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjds/s13688-018-0136-x

Keywords