A Data-Driven Approach for Assessing Biking Safety in Cities

With the focus that cities around the world have put on sustainable transportation during the past few years, biking has become one of the foci for local governments around the world. Cities all over the world invest in bike infrastructure, including bike lanes, bike parking racks, shared (dockless) bike systems etc. However, one of the critical factors in converting city-dwellers to (regular) bike users/commuters is safety. In this work, we utilize bike accident data from different cities to model the biking safety based on street-level (geographical and infrastructural) features. Our evaluations indicate that our model provides well-calibrated probabilities that accurately capture the risk of a biking accident. We further perform cross-city comparisons in order to explore whether there are universal features that relate to cycling safety. Finally, we discuss and showcase how our model can be utilized to explore"what-if"scenarios and facilitate policy decision making.


Introduction
Transport engineers and urban planners have come to realize during the past few years that if we are to realize the vision of smart cities, that is, cities that are livable, sustainable and resilient, we need to move away from the car-centric mobility and turn to more sustainable modes of transportation. As a result cities have indeed turned to alternative modes of transportation, while promoting multi-modal mobility. Bicycles offer a promising transportation alternative to private vehicles, especially in areas with congestion, poor air quality, and high fuel prices [1]. Therefore, one of the major emphasis has been placed on urban biking as an alternative mode of transportation [2]. As a result during the past years the relevant infrastructure (e.g., bike lanes, bike parking racks, etc.) and services (e.g., shared bike systems) have enjoyed a significant growth [3]. This investment has resulted in an increased (perceived and/or actual) safety, either directly or through network effects [4], which consequently has a positive feedback on the propensity of people to ride a bicycle for their commute [5][6][7][8][9].
However, despite this increase in ridership the fraction of commuters that regularly use bikes is still relatively small as compared to other modes of transportation [10]. It is thus, important to understand what factors are related with bike safety and how we can make riding safer in a city. While there exists literature that aims at understanding through surveys how people perceive biking safety, 1 in this study we take a data-driven approach, using bike accident data from the cities of London, Boston and Pittsburgh. We further collect street-level features from OpenStreetMap that capture both the geography as well as, the bike infrastructure on the streets of these cities. Using these data we build a model that quantifies biking safety as the relative probability of a severe biking accident in a given location. Our results indicate that the probability output of our models is well-calibrated both when evaluated (i) on a test set from the same city that the model was trained on, as well as, when evaluated (ii) on a dataset from a different city. The latter, i.e., a cross-city study/evaluation, is a piece that is missing in many studies in the area of urban computing and analytics despite its importance. If we are to really understand various urban phenomena and their universal patterns (if any) across cities, similar cross-city analysis is necessary.
In brief, our results indicate that the presence of (protected) bike lanes is associated with a significant improvement in biking safety-as one might have expected. Furthermore, the speed limit, street topology (straight/curved), and the distance from an intersection are good predictors for the severity of a biking accident as well. While these results might not be surprising, and in fact they might even be expected, our work quantifies these relationships. This further enables their use in "what-if " analysis that can facilitate local government decision making as we elaborate on later. In summary, the main contributions of our work are: • We develop a bike safety model using public bike accident and map datasets from 3 diverse cities. • We perform a cross-city evaluation in order to identify possibly universal patterns related with biking safety. • We also showcase how these models can be used by policy makers to evaluate the current infrastructure and decide on updates that maximize bike safety (possibly under other constraints). The rest of the paper is structured as follows: Sect. 2 discusses related to our study literature and further differentiates our work. Section 3 describes the datasets used in our analysis as well as the biking safety model developed. We further provide our model evaluations in the same section. Section 4 shows how our model can be used for facilitating decisions on infrastructure updates that maximize biking safety under various constraints. Finally, Sect. 5 discusses the limitations of our study, while Sect. 6 concludes our work.

Related studies
In this section we will discuss relevant to our study research and position our study within this literature.
For the past few decades, transportation planners have been trying to identify how an urban environment can attract more bike commuters. For example, Clarke [11] explores the key factors in having a bike-friendly city. While Clarke focuses on the institutionalization of bicycle programs within agencies, safety is identified as an important factor that agencies will have to consider and overcome in order for biking to be fully integrated to the rest of the urban transportation network. As a result there are several studies in the literature that aim at assessing the safety (or risk) of biking in the city and identifying characteristics of the street network that could help improve safety. A main differentiating point between these studies is the way the concept of safety is quantified. There are three ways that have been used to examine the concept of safety: • Actual safety: In this case the risk of biking is directly assessed through data that capture both accidents/fatalities as well as exposure of bicyclists to these risks (e.g., bike trip counts). • Perceived safety: In this case risk of biking is assessed through user surveys, and hence, provide a subjective view of the road users. • Inferred safety: When there is not a direct measure of quantifying the risk of biking, the latter can be inferred through indirect measures, such as the relative position or speed of motor vehicles to bicycles. In order to calculate the actual safety, studies have relied on exposure estimates from national surveys (e.g., [12]) or through video taping of traffic (e.g., [13,14]). An interesting approach for dealing with the problem of estimating the exposure volume (e.g., the number of trips a bicyclist has taken, the number of bike trips over a specific segment etc.) has been presented in [15,16]. In these studies, the authors use a case-crossover approach, where data from bike accident locations are matched with a random point on the trajectory the bicyclist followed at the trip that led to the accident. The dataset that is created this way allows to examine the influence of infrastructure on injury risk, while ensuring strict control for external covariates (i.e., exposure to risk, cyclist traffic volume) and for personal and trip characteristics (e.g., propensity for risk-taking, time of day). However, while this case-crossover approach does not require knowledge of the exposure volume, it does require the full trajectories of bike trips, which can also be difficult to access.
Given the possible lack of detailed data on bicycle accident or risk exposure, especially during the beginning of biking expansion in cities, 2 researchers have also been interested in assessing and quantifying the perceived safety of biking. Even though actual and perceived safety can differ, understanding how people perceive biking safety is particularly important; it can provide simple ways with which transportation and city authorities can nudge more people to use bicycle for their commute. To that end several studies have performed user surveys to understand what street features commuters associate with safety and/or which street aspects cyclist pay attention to (e.g., [18][19][20][21][22]).
This research has led to the development of various indices that aim at quantifying the risk/safety associated with biking over a specific street segment. In the late 1980s, early 1990s, the efforts were mainly focused on utilizing a combination of perceived and inferred safety. Davis [23] was the first to develop a score (Bicycle Safety Evaluation Index) with the objective of estimating the probability of an accident over a specific segment. The index combines information for the number of lanes, pavement, speed limit and similar variables. However, the weighting of each of these variables is subjective and it does not incorporate bicycle trips volume as an exposure variable. Despite its shortcomings the Davis work was the basis of the roadway condition index (RCI) and the Florida's Bicycle Coordinator segment condition index (SCI) [24]. Landis [25] built on these theoretical models to develop the interaction hazard score (IHS), whose results are evaluated through surveys with bicyclists, essentially attempting to recreate the perceived safety. Many of these very early attempts to develop models for bike safety in a city had to deal with the absence of access to detailed data on bike accidents (and street level mapping) that we have today. Closest to our study is the work by Allen-Munley et al. [26], who used crash data from Jersey City to model the severity of a bicycle crash. In particular, the authorssimilar to our study-build a logistic regression model for the severity of an accident using as independent variables various street features (e.g., speed limit, number of lanes etc.). The authors are focused more on a descriptive model, and hence, they do not evaluate the predictive power of their models. The latter is the main focus of our study, since it can drive educated recommendations for alterations in the infrastructure that potentially can improve biking safety. Furthermore, we also cross evaluate the models in different cities, in order to examine the presence of any universal patterns in terms of biking safety.

Data and methodology
In this section we will describe the datasets we used for our analysis as well as the biking safety model we developed.
Bike Accident Data: To perform our study we obtained data on traffic accidents that involved bicycles from three metropolitan areas across Europe and North America. While these data are obtained from different sources (i.e., the Open Data portal of each city) they all provide the information we need in order to build the safety model. In particular, we collected data from: • London: London's bike accident dataset is obtained from CycleStreet 3 and covers the period between 2005 and 2017. • Boston: Boston's dataset is obtained through Cambridge Open Data 4 and covers the period between 2010 and 2013. • Pittsburgh: Pittsburgh's dataset is obtained through the Western Pennsylvania Regional Data Center 5 and covers the period between 2004 and 2017. Table 1 provides some additional basic information from the data. While these datasets are compiled, curated and distributed by different entities, the subset of the information that we need for building a biking safety model is included in all three datasets. In particular, the tuple of interest is: <Lat, Lon, Accident Severity>. While latitude and longitude is unambiguous information, the severity of an accident can be defined differently across the different cities. For example, in the London dataset the severity levels are specified as "slight", "serious" and "fatal" while in the Pittsburgh dataset they are categorized as "not injuries", "minor injury", "moderate injury", "major injury", and "killed". Finally, in the Boston dataset the severity is represented by a Likert-like scale between 0 (no injury) to 4 (fatal). In order to be able to build models that are comparable between the we have used two labels, namely, (i) severe, and (ii) slight, and Table 2 shows the mapping between the lables in each city. We will further discuss the potential limitations from this in Sect. 5.

Street Features Data:
Having the latitude and longitude of a traffic accident allows us to obtain various information about the location that could be correlated with the severity of an accident involving a bicycle. Using the OpenStreetMap's API we extract street-level features for each of the accident locations. These features include speed limit, the existence of a bikelane or not, the width and the length of the street segment, the hilliness, the topology of the street segment (i.e., whether it is straight or curved), as well as, the distance of the accident location from an intersection.
Street Network: An important factor that we would like to examine as part of our model is the traffic volume on each street segment. However, we do not have access to this information, and hence, we take an indirect approach using as proxy appropriate network features. In particular, we extract from OpenStreetMaps the actual street network of each city, N = {V, E}. The set of nodes V represents street intersections, while the set of edges E represents street segments connecting these intersections, i.e., edge e i,j ∈ E exists if there is a street segment between intersection i ∈ V and intersection j ∈ V. Using this network we calculate the edge betweenness centrality for every edge e ∈ E. The edge betweenness β e of edge e is simply the sum over all the possible pairs of nodes (i, j), of the fraction of all pairs shortest paths between i and j that pass through edge e: where σ ij is the number of all shortest paths between nodes i and j, while σ ij (e) is the number of those that pass through e. The betweenness centrality of an edge e is essentially proportional to the probability that e will be part of one of the shortest paths between a randomly selected pair of nodes i and j. Hence, one can make a very plausible hypothesis that the betweenness centrality of a street segment in our network N is a good proxy for the overall traffic over the corresponding segment.

Bike safety models
To reiterate, our goal is to build a model that will capture the safety s (or risk ρ depending on how it is viewed) of biking in an urban environment. Ideally, given data from bike accidents on a street segment a risk score would be assigned to the segment based on the number of these accidents. However, this number should be normalized with the total number of bike trips that go through this street segment. If we only know that there were 10 bike accidents over a street segment, this does not inform us about the biking risk ρ associated with the segment. It can be anywhere from extremely low (e.g., if there have been 1 million bike trips over the segment), to completely risky (e.g., if there have been only 10 bike trips in total over the segment). We could then use this normalized accident count as our dependent variable for a safety model (e.g., through a beta regression since the dependent variable would be bounded by 0 and 1). Nevertheless, traffic data are not easily available/accessible, and this problem is particularly pronounced for bike traffic. Furthermore, in this setting the choice of spatial aggregation could become challenging. While street segment might seem an appropriate aggregation, different street segments can have vastly different lengths, which can impact both the estimation of the independent variables themselves, as well as, the dependent variable. The latter could be dealt with, by normalizing with the length of the street, but then interpretation and model choice (since the dependent variable is not a probability anymore and beta regression would not be appropriate) would be challenging. Therefore, we rely on a slightly different modeling setting and definition of risk ρ that has also been used in existing literature [26].
In particular, we focus on a specific location l (that is, an actual latitude/longitude pair, rather than street segment or other spatial aggregation), and define the risk associated with this location ρ l as the probability of an accident that happens at l being severe. We can think of this definition as a second order safety score. Simply put ρ l is the conditional probability of an accident at l being severe given that an accident at l happened (A l ), i.e., ρ l = Pr[ l |A l , x]. The aforementioned probability is also conditioned on the vector x, which are the independent variables of our model as described in what follows. We model this probability through a logistic regression model, i.e.: In this setting, every bike accident recorded is a data point associated with a binary variable (1: severe, 0: non-severe accident). Then for every point of bike accident we obtain the following features as our independent variables: 1. Speed-limit (v): This variable captures the allowed maximum speed in the corresponding accident location.    Table 3 presents the results obtained from each city. We train these models using the last 4 years that each dataset covers. The reason for restricting building the model with more (but older) data is twofold: (i) the street features 15 years ago are most probably (very) different than what they are now, and (ii) as people get more used to biking around the city and use improved biking equipment that exists today the risks associated can also change. Therefore, we want to restrict our model to only (fairly) recent data. As we can see all of the features are strongly correlated with the probability of a severe accident and the direction of these correlation is consistent across cities. For example, an accident that happens on a central street (i.e., large betweenness) is expected to have higher risk for severity, while the presence of bike lanes reduces this risk.
However, apart from the direction of the correlations, we are also interested in how different the various coefficients of the models are. Hence, in Fig. 1 we present the 95% confidence interval of the coefficients for each of the models. As we can see, even though the actual coefficients are different, their confidence intervals overlap, which essentially means that their is no statistical difference between the model coefficients. 6 In other words, biking safety appears to be correlated with street features in similar ways across the different cities we examined.
Finally, we selected a 1000 random locations from a different city (San Francisco, California) and applied each one of the risk models learnt to estimate the conditional probability of a severe bike accident on that location. Our goal with this experiment is to compare the differences in the predictions provided by the various models for the same points. Figure 2 presents the scatter plots of these predictions, where the two axes presents the risk probability based on the models from two different cities. As we can see the points follow closely the y = x line, which means that all models give similar risk probabilities for the same locations.

Evaluating the models' predictive power
Apart from the descriptive nature of the biking risk models explored above, we are also interested in their predictive power. To evaluate the predictive power of each model we split each dataset to a training and a testing set. In particular, for each city we use the first two years in our dataset for training and the last two as our out-of-sample testing set. Table 5 present the out-of-sample accuracy for each model, as well as, the Brier score. The latter is a measure of probability calibration. Specifically, for a probabilistic model its classification accuracy paints only part of the picture. For example, two models M 1 and M 2 that both predict an accident will be severe-i.e., Pr[ l |A l , x] > 0.5-they will have the same accuracy. However, if Pr M 1 [ l |A l , x] = 0.9 and Pr M 2 [ l |A l , x] = 0.55, the two models have different probability calibration. Brier score [27] quantifies this calibration for each model. In particular, for the case of binary probabilistic prediction, the Brier score is calculated as: where N is the number of observations, π i is the probability assigned to instance i of being equal to 1 and y i is the actual (binary) value of instance i. The Brier score takes values between 0 and 1 and as alluded to above evaluates the calibration of these probabilities, that is, the level of confidence they provide. The lower the value of BS the better calibrated the output probabilities are. Continuing on the example above a 0.9 probability is better calibrated compared to a 0.55 probability (when the ground truth is label 1) and hence, even though M 1 and M 2 have the same accuracy, M 1 is better calibrated (lower Brier score -0.01 compared to 0.2025). As we can see all models exhibit out-of-sample accuracy higher than 82%, while their calibration is very good as captured by the Brier score. In particular, the Brier score for each model is much lower as compared to the corresponding score for a reference model. The latter is a typical baseline Brier score used for evaluating the quality of a model, and is obtained through a baseline model that assigns to each data point the base probability of the positive class (in our case "severe" accident BSS will be equal to 1 for a model with perfect calibration, i.e., BS = 0. A model with no skill over the reference model, will have a value of 0 since BS = BS ref . If BSS < 0, then the model exhibits less skill than even the reference model. Finally, we evaluate the accuracy of the probability output of each model by deriving the probability calibration curves for the test set in each case. In order to compute the accuracy of the predicted probabilities we would ideally want to have several bike accidents (e.g., 100) happen at the same location l. If the model assigned a 75% probability of an accident at location l being severe, then we would expect about 75 of the accidents observed in l to be severe. However, this is clearly not realistic to have (at least for the vast majority of the locations) and hence, in order to evaluate the accuracy of the probabilities we will use  all the accidents in our dataset. In particular, if the predicted probabilities were accurate, when considering all the accidents A where a severe incident was predicted with a probability of x%, then a severe accident should have been observed in (approximately) x% of A.
Given the continuous nature of the probabilities we quantize them into groups that cover a 10% probability range. Figure 3 presents the predicted probability of a severe accident on the x-axis, while the y-axis presents how many of these accidents were indeed severe. Furthermore, we present in the inset figure for each calibration curve the distribution of the predicted probabilities, which as we observe cover fairly uniformly the whole range of probabilities. As we can see the calibration curve is very close to the y = x line, which practically means that the predicted probabilities capture fairly well the actual biking risk probabilities.

Cross-city models
In Sect. 3.1, we compared the coefficients for the safety models we built for the different cities. Here we want to evaluate the ability of a model trained on a specific city to predict the severity of accidents in a different city. These cross-city evaluations and comparisons are largely absent from the current literature in urban informatics, but are very important. The ability to predict the severity of accidents in city A based on a model that was trained with data from city B, is a good indicator that there might be universal patterns in biking safety. Simply put, knowledge obtained from a specific city might be transferable (and generalizable) to another city. In particular, we use the model trained with data from each city in our study to predict the severity of accidents in the other two cities in our study (6 pairs in total). Table 6 presents the accuracy and the Brier score for these experiments, while Fig. 4 shows the reliability curves for these predictions. As we can see, the accuracy, while reduced as compared to the out-of-sample performance for the same city, is still high. The same is true for the Brier score (and the corresponding Brier skill scores). In particular, the performance   Fig. 4 presents the calibration curves for the different pairs of training-test cities. As we can see again, the calibration of the probabilities are good and the curve is close to the y = x line.

Application demonstration: improving bike safety under constraints
This type of models allow us to evaluate the biking safety within a city and can facilitate policy decisions with respect to infrastructure updates. For example, Fig. 5 presents the interactive map of results we obtained from our model for the whole city of Pittsburgh. By clicking on each point, users are able to obtain its safety score. In particular, for every point on the street network we estimate the safety s (which is 1-ρ) associated with this point and visualize it on the map. Safe locations (i.e., low risk of severe accident) are colored green, while risky locations are colored red. Using these results one can start identifying locations that are not as safe as needed and explore what infrastructure changes (e.g., widening of a street, installation of a bike lane etc.) are going to have the maximum impact on biking safety, possibly under budgetary (and policy) constraints.
For example, Fig. 6(a) presents a three square miles area located within the city of Pittsburgh. The selected area does not currently include separate bikelanes for cyclists' transportation. The average safety score obtained for this area is s = 0.54.
Let us assume that we would like to enhance biking safety for this area and we have budget for adding bike lanes. Furthermore, we have the constraint that we cannot install bike lanes on local streets (which typically have low traffic anyway). Figure 6(b) presents the safety scores for the same area but now assuming that bike lanes have been installed in all of the non-local streets. Now the average safety score has increased to s = 0.68. This corresponds to an approximately 26% increase in the average biking safety within the area by simply adding bike lanes in non-local streets.
Obviously, with cities being cash-strapped today, they cannot just install bike lanes, or widen all the streets. However, models similar to the ones we developed in this study, allow policy makers to better understand the correlations between the characteristics of Figure 6 The examined area has an average safety score of 0.54. By adding bike lanes to the main streets, the safety score increases to 0.68; a 26% increase in the average safety score the existing infrastructure and how one could potentially improve the safety for bikers. In fact, one could explore a variety of (viable) options and decide which one is the optimal based on the criteria of interest (e.g., trade off between safety and cost etc.).

Discussion and limitations
Our study contributes to the literature of urban informatics and more specifically to the area of computational transportation modeling. A large portion of the literature that lays at the intersection of transportation and cycling is focused on studying the commute behavior of cyclists, policies and infrastructure that can nudge people towards cycling, as well as, the benefits-both personal and societal-from increased levels of biking in a city (e.g., [28][29][30][31][32][33][34][35][36], with the list not being exhaustive). As mentioned earlier bike safety is related with increased levels of biking, and even though causality most probably runs in both directions [28,37], understanding the association between the urban infrastructure and cycling safety is crucial if cities are to rely more on biking in the future. While models for biking safety have been developed in the past they are typically based on a small number of observations and most importantly cover a single city/area. In our study we aim at identifying aspects of biking safety that are similar across cities. While the three cities we examined are not representative of all-or even most of-the cities worldwide, they exhibit some very substantial differences in terms of both their biking infrastructure, as well as, their geography. For example, we used MapQuest's elevation API an obtained a uniformly random sample of elevation for points on the street network of each city (approximately 15,000 points in each city). Figure 7 presents the distributions of the elevation for the three cities. What is particularly interesting in this case in terms of biking is the variability of the elevation across the street network. As we can see the variability in Boston is much lower as compared to London and Pittsburgh. The standard deviation of the elevation is approximately 57, 43 and 17 meters for Pittsburgh, London and Boston respectively. While elevation is only one aspect of a cities geography, it is a particularly important one for biking and bike usage [38]. Furthermore, the bike infrastructure in these cities is different. Based on our OpenStreetMap data, the fraction of street length equipped with biking infrastructure (dedicated lanes and protected tracks) varies. In London, this fraction is 9.2%, which is in stark contrast to only 5.6% and 2.8% for Boston and Pittsburgh respectively. Of course, there are also deeper differences (at least between London and the two US cities in Figure 7 The distribution of elevation for a random sample of points on the street network for the three cities studied our study) that relate to the overall structure and planning of the city, that is, the division between traditional cities and the radiant cities model (with the latter being the model for many American cities). While of course we cannot claim that our models capture globally universal patterns, the differences across the three cities studied provide support for their generalizability potential.
Furthermore, our street level data were collected from OpenStreetMap. While Open-StreetMap is expected to provide high quality information in big cities like London, Boston and Pittsburgh, it is still a crowdsourcing platform and hence, it is possible to still have incomplete/erroneous information. However, the general trends identified are not expected to be significantly impacted by the crowdsourcing nature of OpenStreetMap. One of the possible limitations of our study is that some of the street features we collected from Open-StreetMap are contemporary and they might have been different when the actual accident data were recorded. For example, the speed limit, or the presence of a bikelane might have changed over the years. We have tried to account for that by analyzing only recent data. Similarly, while we have used the street betweeness β as a proxy for the traffic on a street, this does not account for the temporal nature of the traffic. Simply put, even though betweeness might be a good proxy for the average traffic over a day, it is not able to capture its temporal patterns. Moreover, the original severity labels for an accident have different categories. Although we unified them as demonstrated in Table 2, there is still a subjective element on how the original data were classified.
Finally, other classification models (e.g., tree classifiers, RNNs etc.) could possible perform better in predicting the severity of a bike accident. However, we wanted a model that is easily interpretable. This is particularly important for policy makers that want to make decisions based on the information provided by the model. At the same time more independent variables, e.g., features obtained from visual cues-presence of trees, head-in parking, presence of billboards etc.-could provide further information and further improve the quality of the model. We plan to explore similar features in future studies.

Conclusions
In this work our objective is to build an appropriate model that will allow us to understand the relationship between street infrastructure features and biking safety. To achieve our objective we obtain traffic accident data which involved a bicycle from three metropolitan areas, namely, London, Boston, and Pittsburgh. We further collected street features for the locations of the accidents and built a logistic regression model for each city for the probability of a severe accident, conditional to the presence of an accident. Our results indicate that these models are transferable between cities (at least between the ones we examined). In particular, the coefficients are statistically the same in all three models, while each model has very good predictive power for accidents that happened in a different city. Finally, we show how our models can be useful to policy makers by evaluating the current state of bike safety within a city and analyzing "what-if " scenarios for infrastructure updates.