Probing crowd density through smartphones in city-scale mass gatherings
© Wirz et al.; licensee Springer. 2013
Received: 6 September 2012
Accepted: 6 June 2013
Published: 14 June 2013
Skip to main content
© Wirz et al.; licensee Springer. 2013
Received: 6 September 2012
Accepted: 6 June 2013
Published: 14 June 2013
City-scale mass gatherings attract hundreds of thousands of pedestrians. These pedestrians need to be monitored constantly to detect critical crowd situations at an early stage and to mitigate the risk that situations evolve towards dangerous incidents. Hereby, the crowd density is an important characteristic to assess the criticality of crowd situations.
In this work, we consider location-aware smartphones for monitoring crowds during mass gatherings as an alternative to established video-based solutions. We follow a participatory sensing approach in which pedestrians share their locations on a voluntary basis. As participation is voluntarily, we can assume that only a fraction of all pedestrians shares location information. This raises a challenge when concluding about the crowd density. We present a methodology to infer the crowd density even if only a limited set of pedestrians share their locations. Our methodology is based on the assumption that the walking speed of pedestrians depends on the crowd density. By modeling this behavior, we can infer a crowd density estimation.
We evaluate our methodology with a real-world data set collected during the Lord Mayor’s Show 2011 in London. This festival attracts around half a million spectators and we obtained the locations of 828 pedestrians. With this data set, we first verify that the walking speed of pedestrians depends on the crowd density. In particular, we identify a crowd density-dependent upper limit speed with which pedestrians move through urban spaces. We then evaluate the accuracy of our methodology by comparing our crowd density estimates to ground truth information obtained from video cameras used by the authorities. We achieve an average calibration error of and confirm the appropriateness of our model. With a discussion of the limitations of our methodology, we identify the area of application and conclude that smartphones are a promising tool for crowd monitoring.
City-scale mass gatherings attract hundreds of thousands of attendees. On 25 April 2011, an estimated number of 1.2 million spectators congregated in London for the wedding of Prince William and Catherine Middleton . Around 2 million people gathered on 25 May 2010 in Buenos Aires to attend several concerts and street art parades celebrating the Bicentennial of the May Revolution . Up to 2 million people got together in Madrid, Spain for a parade celebrating the success of the Spanish national football team winning the 2010 FIFA World Cup . Such events with many visitors but with a restricted area and complex architectural configurations like narrowings and intersections bear the risk of dangerous crowd incidents [4, 5]. It is therefore a top priority for organizers of such events to maintain a high level of safety and to minimize the risk of crowd incidents. Hereby, guidelines on planning help minimize the risk by deploying adequate safety measures [6, 7]. The raise of pedestrian simulation tools has enabled the identification of critical locations where dangerous crowd behaviors may emerge [8, 9]. Simulation tools help to design and proactively deploy crowd control mechanisms before mass gatherings to mitigate the risk of dangerous crowd incidents. However, despite a proper preparation, the behavior of the crowd during an event remains highly unpredictable [10, 11]. Hence, emerging critical crowd situations need to be detected at an early stage in order to mitigate the risk of a situation evolving towards a dangerous incident. Crowd density, i.e. the number of people per unit area, has been identified as one important measure to assess the criticality of a situation [12, 13] and there is a need to obtain this information during an event .
In our ongoing research effort, we want to turn pedestrians’ smartphones into a reliable sensing tool for measuring the crowd density during city-wide mass gatherings. In a previous study , we introduced a participatory sensing system for crowd monitoring by tracking the location of attendees of mass gatherings via their smartphones. Attendees of such a mass gathering can download a smartphone App to record the user’s location at regular intervals. This information is collected from all App users and used to infer the users’ current spatial distribution. To motivate as many attendees as possible to download the App and share their locations, the App offers a set of features including an interactive festival program and maps of the venue as an incentive to all. Nevertheless, by following a participatory sensing approach, we expect only a fraction of all attendees to participate and hence, the location of only a limited set of pedestrians is known. Therefore, the explanatory power of the obtained distribution is limited as these numbers do not provide direct evidence of the actual crowd density.
In this work, we address this challenge and present a methodology to infer the crowd density by tracking the locations of a subset of all event attendees. Our methodology relies on a calibration approach that provides a relation between the distribution of App users and the crowd density. Hereby, we make use of the characteristic that pedestrians exhibit a distinct behavior which depends on the crowd density in the vicinity. By assessing the behavior of the App users and applying our model, we obtain a crowd density estimation. Evaluation of our approach is performed with a real-world data set collected during the Lord Mayor’s Show 2011 in London, a festival attracting around half a million spectators. We use this data set to confirm the suitability of our methodology and evaluate the accuracy of our crowd density estimation by comparing our results to results from video footage obtained from CCTV cameras. We conclude our work by addressing the limitations of our methodology and identifying next steps.
This section discusses related work. Section 2.1 introduces crowd characteristics relevant to assess the criticality of a situation during mass gatherings. Section 2.2 compares technologies and methods to measure such crowd characteristics with a focus on crowd density.
Various empirical studies have analyzed crowd behaviors during mass gatherings and identified critical, potentially dangerous situations: A focus in literature has been the investigation of human stampedes [15–19]. Stampedes often occur if people start to rush towards a common target. Congestions, or clogging, at narrowings and counter flow of pedestrians have been identified as critical situations in which stampedes may occur [20, 21]. Irregular pedestrian flow is an additional risk which may cause turbulent motions in a crowd . Johansson et al.  identified the transition from smooth pedestrian flow to stop-and-go waves as a warning sign of a critical situation.
Chart of crowd density
Behavior and risk
Critical crowd density for static crowds
Stream of pedestrians can maintain normal walking speed and avoid one another
Walking speed is reduced
Involuntary contact is experienced between people
Potentially dangerous crowd forces begin to develop
The local crowd density alone does not allow for a complete assessment of the criticality of a situation. In addition to crowd density, the intention or behavior of a crowd is required for a correct situational understanding. As an example, a high crowd density in a static crowd is less critical than a high crowd density exhibiting counter flow. This distinction is also evident in Table 1. A critical crowd density is reached at for a moving crowd. A static crowd, however, can exceed this value before a critical density is reached. Helbing et al.  introduce a measure that incorporates this aspect. They call this measure crowd pressure which is given as the local velocity variance multiplied by the local crowd density. In their work, they identified that crowd pressure can be seen as an early warning sign for critical crowd situations. They identified an increased crowd pressure value right before dangerous crowd turbulence emerges.
Nowadays, video-based crowd monitoring tools are widely deployed. Gong et al.  review the state-of-the-art of vision-based systems for crowd monitoring. They conclude that currently deployed systems suffer from poor scalability to crowded public spaces due to deployment complexity and manually judging the criticality of a situation from the footage. Further, manually monitoring multiple video streams simultaneously requires lots of training for a person. To overcome these limitations, police forces use helicopters to gain an instantaneous overview and men in the field to obtain detailed information .
Recent developments such as multi-camera networks to fuse information from multiple cameras and computer vision algorithms to automatically monitor crowds can mitigate these issues. Jacques et al.  review state-of-the-art techniques. Hereby, the authors differ between object-based approaches and holistic approaches. In object-based approaches, single individuals are detected and tracked individually. Relevant information is fused to analyze group behaviors. As an example, Mehran et al.  use the social-force model introduced by Helbing et al.  to infer crowd patterns from pedestrian tracks. Object-based approaches have been used by Johansson et al.  investigate crowd behaviors during the Hajj in Makkah. Steffen et al.  presented approaches for inferring crowd densities and other crowd behaviors based on pedestrian trajectories.
Holistic approaches do not rely on tracking individuals but follow a top-down methodology in which the crowd is considered as a single entity. These approaches obtain coarser-level information such as crowd density, the flow of the crowd and crowd turbulence but no local, individual-specific information. As an example, Krausz et al.  developed an optical flow-based method for an automatic detection of dangerous motion behaviors including congestions during mass gatherings. They used their method to study video-footage recorded during the Love Parade disaster of 2010 in Duisburg, Germany where 21 visitors died in a stampede. By comparing the two approaches, the authors of  write that while object-level analysis tends to produce more accurate results, the identification of individuals is challenging in high density crowds due to clutter and occlusion which makes it difficult to obtain an accurate estimation of the crowd density.
Despite the recent advances of computer vision and pattern recognition techniques, until now, it remains challenging to obtain an automated global situation awareness during mass gatherings from video footage . Using alternative technologies for observing crowds has recently found interest in the research community. Hereby, thanks to their proliferation, mobile devices like smartphones have increasingly been considered as a viable tool for monitoring the behavior of a crowd. These sensor-rich devices offer various ways to obtain information about the whereabouts of their users and hence allow for monitoring the physical behavior of them . By combining information from many people, the behavior of a collective can be monitored.
To infer crowd conditions like those mentioned in Section 2.1, the location of attendees of a mass gathering is required. There are different approaches to determine a smartphone’s location which can broadly be divided into two classes: in-network localization and on-device localization. The in-network location methods utilize the fact that at any given time, a smartphone is connected to a cell tower in a network. The information which device is connected to which cell tower is being stored centrally in a database and updated constantly. Since the location of each cell tower is known, a position estimation of the mobile devices can be obtained. For on-device localization methods, on the other hand, the location is derived directly on the users’ smartphones by means of GPS positioning, WiFi-fingerprinting or other comparable approaches . The in-network localization approaches have the advantage that the locations of all subscribed devices are routinely being logged by the network operators. Thus, location information from a large number of devices can be obtained without any user interaction (and permission). Popular methods for obtaining in-network location estimation include the recording of network bandwidth usage by detecting how much communication is going on in a particular location. Calabrese et al.  used this measure to investigate crowd dynamics in the city of Rome. The obtained measure is an aggregated number which is highly dependent on communication behavior and is not necessarily correlated to the actual number of individuals in that location. Another method to capture in-network location information is to use Call Data Records (CDRs) [32, 33]. A single CDR tuple is generated for every voice call and Short Message Service (SMS) transaction and consists of the sender and receiver numbers together with a timestamp and the cell ID the sender is situated in. This data is routinely being collected by every network operator for operational and billing purposes. While being useful for many studies, CDR-based location data faces several limitations. Firstly, CDRs are sparse in time because they are generated only when a transaction occurs and not at fixed periodic intervals. Hence, as long as no communication takes place, a smartphone’s location is not being revealed. Secondly, they are coarse in space as they record locations at the granularity of a cell tower sector resulting in a location uncertainty of around 300 meters .
Overview of technologies and methods for crowd density assessment
Social force model
Network bandwidth usage
We conclude that determining the location of a person on a mobile device using GPS or any other localization approach can provide a much more accurate location estimation compared to in-network approaches. On-device localization methods also have advantages over vision-based approaches as limitations such as occlusion or the limitations in low-light conditions are inexistent and that the whole venue space can easily be covered. However, on-device localization approaches face a big challenge: In contrast to in-network methods, the location is determined on a user’s smartphone. To collect this information, a user has to deliberately share it. This requires a dedicated piece software running on the device.
We present in the next section methods to infer crowd characteristics from location information as provided by smartphones. Afterwards, in Section 3, we will address the implications on-device localization approaches face by requiring people to run a piece of software on the smartphones. We then present our method to mitigate the influence.
The density and speed of a crowd are important local characteristics to assess the criticality of a crowd situation. In this section, we present methods to derive these measures from position information of pedestrians and discuss their relation.
where R is the kernel radius and defines the smoothing around the location .
where is the speed of pedestrian i at location and time t. Again, R is the kernel radius.
Section 2.2 discusses the advantages of on-device localization methods for tracking pedestrians and identifies a major challenge: In contrary to in-network approaches, people have to deliberately share their position information. This requires a dedicated piece of software running on a user’s smartphone. At first sight, such an approach may appear undesirable, as it can be assumed that the majority of people is not willing to install such an application and constantly send their current position to a remote server for various reasons, including privacy concerns and energy considerations. In the case of a mass gathering, this may imply that only a fraction of all attendees would run such an application and many would opt for not having their location tracked. However, in a preceding study, we verified that people are willing to share privacy-sensitive location information if they receive some benefits or if they realize that sharing such information is for their own good and safety . Thus, we believe such an approach is still viable and promising by following a participatory sensing scheme where users are motivated to deliberately share their location information by providing them with incentives and making it very transparent what the data is being used for. In  we introduce the concept of a smartphone App that tracks pedestrian’s movements and offers attendees of a mass gathering a set of features which users regard as useful to them, e.g. an interactive program guide, a map superimposing the location of points of interest, or background information about the mass gathering. During the event, users of the App can receive location-dependent messages from the police. Through the users’ smartphones, the police can inform users situated in a particular area with targeted information on how to behave in case of an emergency.
Unknown ratio of App users: The ratio of event attendees using the App at any given moment is unknown. While the absolute number of App users is known, it is usually not possible to obtain the exact number of event attendees at a certain point in time.
Spatial distribution of App users corresponds to the distribution of event: Throughout the whole event we consider a spatial distribution of App users that corresponds to the spatial distribution of event attendees. This means that among the event attendees, the App users are equally distributed. This is important, as it helps us to discover trends. While it does not allow us to directly infer how many people resist at one location, we can identify that a certain percentage of users, and hence event attendees, situates themselves in a given area.
Natural behaviors and interaction patterns: App users behave naturally and interact with the environment and other persons in a similar way as non-App-users. Hence, the averaged behavior of the App users at one specific location corresponds to the averaged behavior of the event attendees in this area. By accepting this assumption, we can infer certain crowd characteristics at a given location even if not every person is being tracked. We simply infer the behavior by considering the behavior of the App users. This is possible because pedestrians in crowds are likely to mimic the behavior of the neighboring pedestrians, e.g. by adjusting their walking speed and direction [62, 63]. By looking at a single individual, this assumption may not hold as a person may always decide independently on their behavior, e.g. stand still, walk in another direction, etc. However, by averaging over the App users, we assume that the averaged App user behavior corresponds to that of the crowd at a given location.
The more pedestrians participate and share their location, the more reliable we can conclude about occurring crowd characteristics. However, the obtained App user distribution does not reflect the actual crowd density. In the following section, we briefly cover the data collection platform and present the data set used for evaluation. Afterwards, we verify the assumptions introduced in this section and focus on the density-speed relation in our data set. Based on the obtained findings, in Section 5.6 we present our methodology to automatically infer a crowd density estimation from the collected position data and evaluate it against ground truth information obtained from video footage.
To collect location updates from pedestrians, we developed a generic App for mobile devices which can be tailored to a specific mass gathering and provides the users with event-related information and features. These features are designed to be attractive and useful during the event to reach a large user base. While a user’s smartphone is running the App, the current location of the device is sampled at using the integrated GPS sensor. Such a high sampling rate was chosen to capture as much of the motion dynamics as possible. Besides the user’s current location, the recorded GPS information also reveals the current velocity and heading direction of a user. This information is logged too. The recorded data is periodically sent a server running the CoenoSense framework. CoenoSense is a data collection backend infrastructure to collect and store arbitrary context information received from potentially thousands of mobile devices simultaneously. It allows for real-time processing of the collected data.
To ensure a user’s privacy, data is sent anonymously and our App offers users full control over data sharing and data recording. It can be disabled by the user at any time.
We deployed the App and the CoenoSense platform during the Lord Mayor’s Show 2011 which took place in London on November the 12th between 11 am and 6 pm. The Lord Mayor’s Show is a street parade in the City of London, the historic core of London and the present financial centre. The App offers a festival program, a map indicating points of interest and additional background information about the event. In collaboration with the event organizers, we event’s official iPhone App and distributed it for free. It was advertised on the Lord Mayor’s Show website and available through Apple’s iTunes App store.
GPS location updates were collected between 00:01 on November 12th and 23:59 the same day and only if a user was in a specific geographical area around the venue the event takes place.
Within the collaboration with the event organizers and police forces, we obtained access to the CCTV video footage recorded during the Lord Mayor’s Show. These are the same video recordings as used by the police to monitor the event. We consider this footage as ground truth information and is used in the following sections to verify our assumptions and evaluate our methods. We used video footage from four cameras placed at different locations. These locations have been identified by the police as being critical with respect to occurring crowd behaviors. For each camera, we defined an area of approximately within which the crowd density is being extracted.
In this section, we report on various spatio-temporal behavior properties that can be discovered in our data set. We start by investigating general statistics and put a special focus on aspects which help to support the assumptions stated in Section 3.2. Afterwards, we focus on the density-velocity relation.
To fulfill Assumption 1, we assume a linear relation between and . With a linear regression analysis, we can assess the quality of the linear relation. The linear regression is depicted in Figure 7(a). The user density depends on the kernel radius R of Equation 1. To understand the influence, we vary the kernel radius R between . Figure 7(b) depicts the influence of the kernel radius on the correlation between the crowd density and the user density. We obtain a low correlation coefficient for small values of R. The correlation coefficient increases to a maximum of for followed by a decline for larger values of R. The observed behavior can be explained in the following way: This variation is getting smoothed out for larger values of R as the area to determine the density is increased. Hence, small variations in the number of available sample points do not affect the density estimation as greatly resulting in lower variations. By exceeding some value of R, the considered area is so large that the estimated density does not capture the local variation anymore. Local variations are smoothed out and large deviations between the user density and the crowd density can be observed. This causes a drop in the correlation coefficient.
A further error might be introduced by the localization errors due to sub-optimal GPS fixes in urban spaces, where often only a limited number of GPS satellites are visible at the street level. It has been shown in  that this error is lower than for 95% of all samples recorded in urban spaces and that the median error is .
We want to investigate towards which extent the density-velocity relation found in our data set corresponds to existing fundamental diagram models. Figure 9(a) and Figure 9(b) show a histogram of the density-velocity relation for a kernel radius of and , respectively. To obtain these plots, we divided time into intervals of one second and calculated for each interval t and for each user that was active in this interval the local density using Equation 1 and the crowd velocity using Equation 2. The plots depict a two-dimensional histogram of all obtained density-velocity tuples (logarithmic scale). The color values indicate the occurrence frequency of a tuple. The two plots reveal some general aspects of the density-velocity relation found in our data sets:
both plots exhibit a clear trend that with higher densities, the velocity range decreases;
for low densities, the whole walking velocity range between and is observed;
low velocity values can be observed for all densities.
Based on the findings deduced in the previous section, we introduce and evaluate a methodology to estimate a crowd density from the spatial distribution of App users. Our method relies on Assumption 2. Section 5.3 shows the existence of a linear relation between the crowd density and the user density. By knowing the parameters of the linear regression, a crowd density can be estimated from the user density. The regression parameters, however, are unknown. Thus, a calibration method is required to obtain these parameters.
where m, q and k are unknown regression parameters and depend on the ratio of App users to event attendees.
The speed of the crowd is obtained using Equation 2. Hence, we can obtain a crowd density estimates by combining Equation 2 and Equation 7. The parameters and are cultural dependent and can be taken from literature (e.g. [48, 53, 55]). The fitting parameter γ, however, remains unknown.
Overview of calibration parameter
Correlation coefficient and calibration error
σ (RMSE) 
Given all these findings, we conclude:
The residual analysis reveals that the error is normal distributed which suggests that the chosen model fits the data well and that the error is not introduced by the model but inherently present in the data,
we achieve a correlation coefficient of for and for , respectively. This implies that there is some predicting power for obtaining a crowd density estimation, and
the calibration error is for and for , respectively.
A participatory sensing approach for crowd monitoring faces a major limitation: Participation is based on a voluntary base. Regardless of the incentivization strategy, we expect that only a small fraction of all attendees of a mass gathering is being tracked. This makes it challenging to conclude about the crowd density. This work addressed this limitation. We presented a methodology which allows to infer a crowd density even if only a small number of crowd members is being tracked. The principle behind our methodology is that the walking speed of pedestrians depends on the crowd density. By measuring the location and speed, we can calibrate the distribution of tracked pedestrians to the distribution of all attendees of a mass gathering using the fundamental diagram. With this, we can infer crowd density estimates.
We used a data set recorded during a city-scale mass gathering to evaluate our methodology. We compared crowd density estimates to ground truth information obtained from video footage: For a kernel radius of , the average calibration error is . Further, a correlation coefficient of 0.83 indicates that a linear relation between the crowd density and the user density can be assumed. The residual analysis revealed that the model fits the data well.
Besides these results, the work presents another finding: We could verify that the walking speed of pedestrians depends on the crowd density. Hereby, we found a similar relation between the speed of a crowd and the density as related work suggests. In particular, we identified a crowd density dependent upper limit speed with which pedestrians move through urban spaces. These upper speed limit values follow existing fundamental diagram models closely.
There are several factors to consider:
The reason for not reaching a higher correlation coefficient than the maximum value of might stem from the unequal spatial distribution of App users and event attendees at certain time steps. However, there are also other factors: It was sometimes difficult to count the correct number of attendees in the predefined area from the video footage as some pedestrians were occluded by others. Therefore, the crowd density extracted from the video is also error-prone.
We obtained the highest correlation coefficient and lowest calibration error for a kernel radius . This is a large radius to infer local characteristics. We believe this is due to the sparsity in our data set. We were tracking less than 1% of all attendees. A smaller kernel radius could provide more accurate local crowd information  but would require a much larger user base. Providing more attractive incentives, making the App available on different mobile platforms and having a good advertisement campaign in place could stimulate a higher participation.
We obtained best results with a radius of . This seems to be like a big area to cover for monitoring crowd. However, as we use a Gaussian weighting scheme to calculate our measures, the influence of the users decays rapidly the further away they are from the center of the circle. Further, we believe that this radius can be smaller by having a larger ratio of App users.
The location sampling rate of was chosen to capture as much of the pedestrian dynamics as possible. However, such a high sampling rate is very energy consuming. Besides privacy considerations, also the heavy battery consumption of such an App might have a detrimental effect on participation. Therefore, it is important to incorporate an efficient energy conserving sampling strategy. This can be achieved by lowering the sampling frequency but also by only reading location updates from GPS if needed. Hereby, low-power acceleration sensors can help to determine if a user is stationary or not and only switch on the GPS if motion is being detected.
Another important issue that has not been addressed in this work is to obtain a confidence measure giving indication about the reliability of the inferred crowd density. It may be that due to a small percentage of users compared to the total number of attendees, the inferred crowd density may even become null. Hereby, a plausibility check e.g. by comparing the active number of users to a roughly estimated number of attendees by the security personnel could give confidence about the inferred crowd density.
This work is one of the first addressing the challenges arising by crowd sensing through a participatory sensing approach with smartphones. We believe the results are promising to stimulate successive contributions. In particular, we see the following next steps to investigate some of the aspects not addressed in this work:
We evaluated our approach on data from only one mass gathering. To generalize the findings, our method has to be applied to data collected during different mass gatherings and the results have to be compared. The type of the gathering and cultural aspects may have an influence.
A sensitivity analysis investigating the relation between the ratio of App users and the accuracy of crowd density estimation helps to understand how many pedestrian need to be tracked to obtain a significant estimation accuracy.
An evaluation of the online performance of our method reveals the required amount of data to estimate a crowd density. The required amount of data is closely connected to the required amount of pedestrians. These two aspects should be investigated jointly.
We used the analytical model of Weidmann to represent the fundamental diagram. As noted in Section 2.3.2, other models exist which consider additional information. The suitability of alternative models for our calibration method remains to be investigated.
A possible demographic bias in our App usage was not taken into consideration. However, such factors influence the behavior of pedestrians. Considering the age or gender distribution or the cultural background could further tune the model parameters.
We did not consider to include spatial characteristics into our model. As the behavior of pedestrians depends on the architectural configuration, such information could be considered to increase the estimation accuracy.
This work shows on the example of crowd density that a participatory sensing approach can give insight into crowd characteristics and provide information relevant to assess the criticality of a situation during city-scale mass gatherings. Given our results and the many advantages of on-device localization (localization accuracy, user control over privacy, multitude of sensor modalities, low deployment cost, etc.), we suggest that smartphones are a viable tool for crowd monitoring.
This work is supported under the FP7 ICT Future Enabling Technologies programme of the European Commission under grant agreement No. 231288 (SOCIONICAL).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.