Cycling into the workshop: e-bike and m-bike mobility patterns for predictive maintenance in Barcelona’s bike-sharing system

Grau-Escolano, Jordi; Bassolas, Aleix; Vicens, Julian

doi:10.1140/epjds/s13688-024-00486-x

Research
Open access
Published: 11 July 2024

Cycling into the workshop: e-bike and m-bike mobility patterns for predictive maintenance in Barcelona’s bike-sharing system

EPJ Data Science volume 13, Article number: 48 (2024) Cite this article

66 Accesses
1 Altmetric
Metrics details

Abstract

Bike-sharing systems have emerged as a significant element of urban mobility, providing an environmentally friendly transportation alternative. With the increasing integration of electric bikes alongside mechanical bikes, it is crucial to illuminate distinct usage patterns and their impact on maintenance. Accordingly, this research aims to develop a comprehensive understanding of mobility dynamics, distinguishing between different mobility modes, and introducing a novel predictive maintenance system tailored for bikes. By utilising a combination of trip information and maintenance data from Barcelona’s bike-sharing system, Bicing, this study conducts an extensive analysis of mobility patterns and their relationship to failures of bike components. To accurately predict maintenance needs for essential bike parts, this research delves into various mobility metrics and applies statistical and machine learning survival models, including deep learning models. Due to their complexity, and with the objective of bolstering confidence in the system’s predictions, interpretability techniques explain the main predictors of maintenance needs. The analysis reveals marked differences in the usage patterns of mechanical bikes and electric bikes, with a growing user preference for the latter despite their extra costs. These differences in mobility were found to have a considerable impact on the maintenance needs within the bike-sharing system. Moreover, the predictive maintenance models proved effective in forecasting these maintenance needs, capable of operating across an entire bike fleet. Despite challenges such as approximated bike usage metrics and data imbalances, the study successfully showcases the feasibility of an accurate predictive maintenance system capable of improving operational costs, bike availability, and security.

1 Introduction

Bike-sharing systems (BSS) are a crucial part of urban mobility solutions, providing an eco-friendly alternative to private vehicles. Users notably reduce their reliance on other transportation modes [1], leading to increased physical activity and travel time savings [2]. Additionally, urban environments benefit from reduced fuel consumption and improved economic development [2]. Moreover, BSS have experienced substantial growth since the 2000s, with 2022’s global landscape encompassing nearly 2000 BSS and an impressive fleet of approximately 9 million bicycles [3]. The scale of these systems and technological advancements have enabled the generation of large amounts of data, providing researchers and decision-makers with valuable insights to improve existing systems.

BSS have been extensively studied from various angles, including station location optimization [4] and fairness [5], bike rebalancing strategies [6], as well as studies on user behavior changes and their impact on health [1, 2]. While substantial research has focused on human mobility within BSS in recent years [7, 8], the advent of fourth-generation BSS has introduced a new layer of complexity through the global adoption of electric bikes (e-bikes) [3, 9]. Existing BSS, such as Barcelona’s Bicing, in Spain, have progressively integrated e-bikes into their mechanical bike (m-bike) fleets, while there are instances of entirely new fully electric systems like Barcelona metropolitan area’s AMBici. As BSS users increasingly embrace e-bikes for their speed, convenience, and reduced physical exertion, significant changes in existing BSS mobility dynamics are expected. However, the evolving mobility dynamics resulting from the coexistence of both mechanical and electric transportation modes remain an area ripe for study.

Another significant aspect to consider in the context of BSS is the maintenance operations (MOs) for bikes. While the importance of maintenance as a critical activity for all companies has been underscored in various studies [10, 11], BSS maintenance practices have predominantly relied on the traditional approach of applying corrective measures (i.e., addressing issues as they arise by replacing worn-out or damaged parts). Therefore, enhancing the BSS user experience and operational efficiency could be achieved by developing a predictive maintenance (PM) strategy capable of predicting bike component failures before they occur, allowing for timely scheduling of maintenance activities. This approach would not only improve the availability and security of the bike fleet but also reduce costs.

To address these topics, this article leverages both trip and maintenance datasets provided by Bicing BSS. Our research revolves around four key questions. Firstly, we delve into whether m-bikes and e-bikes can be categorized as two distinct mobility modes, despite sharing the same infrastructure. This involves an in-depth analysis of their respective usage and mobility patterns, aiming to identify any significant disparities between the two. Secondly, we explore bike component failure patterns, seeking to understand how the bike model factor contributes to the wear and tear of distinct parts. Thirdly, we aim to assess the potential of forecasting bike component failures by evaluating the accuracy of various survival models. Specifically, our PM system focuses on three key components: brake pads, wheel spokes, and chains, chosen for their substantial representation in the data. Lastly, we aim to identify the critical factors that influence predicting the longevity of the bike components.

2 Background

2.1 BSS mobility

Research in BSS mobility typically involves the examination of aspects like temporal usage patterns and trip characteristics, which include factors such as distance, duration, and speed. This kind of analysis serve in multiple purposes, such as examine general system dynamics [12–14], designing effective rebalancing strategies [15], predicting BSS demand [16], and estimating trip destinations and durations [17]. While trip characteristics can vary depending on the specific city in which the BSS is implemented, existing publications generally concur on certain approximate values. On average, BSS trips tend to have a mean distance of between 1 and 2 kilometers, a mean duration ranging from 10 to 20 minutes, and an average speed in the range of 10 to 15 kilometers per hour. Regarding trip distances, GPS data has been used to map actual ride routes [7]. However, when GPS data is unavailable, trip distances are estimated. In some instances, the shortest paths between origin and destination stations are calculated [18]. Nevertheless, comparisons between GPS routes and the shortest paths have suggested that the shortest paths may not accurately reflect the actual choices made by walkers, drivers, and cyclists [19–21]. Factors such as a greener environment, the presence of amenities, or roads with greater connectivity often lead to deviations from the shortest paths [21].

An alternative perspective for studying BSS mobility emphasizes the geographical aspects of mobility. While some studies have examined the overall usage of BSS across multiple cities [22–24], much of this research is centered around understanding the patterns within individual BSS. Most previous studies in this domain focus on the usage of individual docking stations, often utilizing concepts such as incoming and outgoing trips [18, 23, 25–28], as well as station occupancy [24, 29]. However, for a more comprehensive understanding of BSS mobility structures, the flows between stations have also been explored [8, 30, 31].

A relatively less studied area is the relationship between urban topology and BSS usage, specifically how the altitude difference between the origin and destination stations of a trip influences bike usage. In this regard, [32] found that elevation has a negative impact on the number of incoming trips and a positive impact on the outgoing, while [33] found that altitude difference, together with stations’ distance and weather features are good predictors of bike usage.

On the other side, when characterizing BSS mobility, the majority of research has predominantly focused on traditional m-bikes, with only a limited number of studies comparing their usage with other transportation modes. For instance, [34] analyzed the temporal variations in BSS and car usage, while [12] compared BSS trip distances and average speeds with those of car and pedestrians. Moreover, there is an important field of investigations in comparing between e-bikes and scooters. Spatio-temporal mobility patterns of BSS and dockless scooter-sharing services were compared in [35], finding that BSS are mainly used for commuting, whereas scooter-sharing serves different purposes. Additionally, [36] explored the average speed of shared e-scooters and BSS e-bikes, finding that e-bikes generally travel faster than e-scooters.

Mobility data from Bicing has served as a valuable resource in various articles, with the majority of studies relying on data from around 2010, a period characterized by fewer stations, bikes, and bike lanes compared to the situation in 2022. Notably, this was also a time before the introduction of e-bikes into the Bicing system. In this context, two early studies obtained from the Bicing website the number of occupied and empty bike docks at stations with the purpose of unraveling the general spatio-temporal patterns of Barcelona dynamics [37] and comparing BSS usage patterns on weekdays and weekends employing hierarchical clustering [38]. Additionally, Bayesian networks were used to predict the number of available bicycles at each station. In the same line, [39] utilized the same data to generate bike availability predictions in the stations using Auto-Regressive Moving Average (ARMA) models. More recently, in 2022, a study delved into predicting the usage of the BSS system and examined the impact of the COVID-19 pandemic on these predictions [40]. This last research used a much more comprehensive and updated dataset, including data from 2020 and 2021, which contained information about the origin and destination stations, as well as the start and end times of individual trips.

2.2 Predictive maintenance

Since the 1990s, the scientific literature has emphasized the importance of maintenance as a critical activity for companies to improve reliability and reduce costs [10, 11]. Over time, knowledge and techniques have evolved, transitioning from a corrective maintenance, which involved addressing failures after they occurred, to a preventive maintenance [41], which implements scheduled maintenance based on time intervals to prevent breakdowns. While this approach helps to mitigate most failures, it comes with the drawback of high prevention costs. With the increased computational power, the advancement of artificial intelligence, and the rise of the IoT, a new strategy called PM has emerged, which can predict failures before they happen, allowing an even lower failure rate and reduced costs. One of its main challenges is fault prognosis, which focuses on forecasting when a failure is likely to occur [42]. By accurately predicting failures, maintenance activities can be scheduled in advance, minimizing downtime, and optimizing resource allocation.

Reliability theory plays a significant role in PM modelling, and it is essentially the same as survival analysis (SA) [43]. SA was originally developed in the field of biomedical sciences to examine life tables [44], but its concept of events can be applied to various domains, such as machine component failures. A key challenge in SA studies is censoring, which refers to missing data when an event is not observed [45, 46]. Censoring occurs not due to technical failures but rather due to the nature of the studied event. For instance, if a participant in a clinical trial decides to discontinue his participation before the event of interest occurs. It is precisely the presence of censored data that makes impractical the application of predictive algorithms using the usual statistical and machine learning approaches.

According to [47], survival methods can be classified into statistical and machine learning methods. Statistical methods focus on characterizing the distribution of event times and the statistical properties of parameter estimation, such as estimating survival curves. These first models can be further divided into: (1) Non-parametric models (Kaplan-Meier, Nelson-Aalen, Life-Table), which make no assumptions about the underlying distribution and estimate the survival curves directly from the data. (2) Semi-parametric models (Cox model, CoxBoost, Time-dependent Cox), which incorporate both non-parametric estimation of the baseline survival function and parametric estimation of the effects of covariates. (3) Parametric models (Penalized regression, Accelerated Failure Time), which assume a specific distribution for the survival time and estimate the parameters of that distribution. Machine learning methods combine traditional SA techniques with machine learning algorithms, such as survival trees, Bayesian methods, neural networks, or support vector machines. Advanced machine learning techniques, including ensemble learning, active learning, transfer learning, and multitask learning, have also been applied in the field of SA.

These methodologies have been applied in a wide range of fields, including healthcare [48], reliability [49], crowdfunding [50], student retention [51], customer lifetime [52], and unemployment duration analysis [53]. However, to the best of our knowledge, there are no applications of SA to BSS. Apart from SA, scientific literature on bike’s PM is scarce. [54] proposed using smartphone vibration readings and support vector machine models to predict the health of mountain bike’s components (the rotor, the chain, the wheel bearings, the steering head, and the derailleur cog). However, the scope of this study was not a fleet of bikes but a single bicycle. On the contrary, [55]’s main objective was to study the cyclists’ behavioral patterns in the BSS of Oslo, Norway, and successfully identify the need for bike maintenance. In this case, random forest models were applied to the ride (destination, duration, and date) and the cyclist (gender and year of birth) data. Finally, [56] focuses on predicting brakes’ performance with KNN, LSTM and XGBoost classifiers, using as input physical influences and acceleration/deceleration forces coming from hall and inertial IoT sensors.

3 Data and methods

3.1 Case of study

Launched in 2007, Bicing BSS operates in the city of Barcelona, Spain, with strategically located stations across the city that serve as docking points for bicycles. Over the years, Bicing has experienced multiple expansions, and, as of December 2022, it included nearly 260,000 unique users, approximately 7000 bikes, and 519 permanent stations with between 12 and 54 docking points. One distinctive feature of this BSS is the presence of two types of bikes since 2019: e-bikes and m-bikes, with the presence of batteries that allow motorized assistance up to 25 km/h on e-bikes being the main difference between them. Each station is designed to accommodate both e-bikes and m-bikes, and all dockers allow docking for both types of bikes. The number of e- and m-bikes have evolved over time. In 2019, e-bikes represented only 15% of a fleet that comprised 6700 bikes. Since then, there has been a gradual shift, with 2000 m-bikes being upgraded to e-bikes, and additional e-bikes being introduced. As a result, in December 2022, e-bikes account for 47% of the expanded fleet. The usage of these bikes is associated with an annual payment and, additionally, with fees that are typically based on the duration of the rental. Moreover, e-bikes are associated with a small initial cost for each ride.

3.2 Data

To answer the previously exposed research questions, Bicing has made available to this article two bike-sharing data sets: a trips dataset and a maintenance dataset (Appendix A, see Additional file 1).

The trips data set encompasses individual trips generated from April 2019 to December 2022 (i.e., 3 years and 9 months). Out of 53 million trips, 33 million (62%) were completed by m-bikes, while 20 million (38%) were made with e-bikes. Each trip entry includes the following information: starting and ending dates and times with second-level granularity, starting and ending stations, bike identifier, bike model (m-bike or e-bike), and an anonymized user identifier. Additionally, geographical data for the 519 stations, including latitude and longitude coordinates, are provided for the study.
Maintenance data is comprised of a total of 310,000 maintenance orders (MOs), which correspond to various bicycle repairs types executed between September 1st, 2020, and January 1st, 2023 (i.e., 2 years and 4 months). Each MO record provides the following information: MO identifier, date, category, subcategory, bike identifier, and bike model (m-bike or e-bike). There exist a total of 12 categories and 87 subcategories, encompassing a wide array of actions such as cleaning, greasing, adjusting, or changing bike parts. Categories range from fewer than 100 interventions to 120,000, with the majority of them falling into the brake and wheel categories (66%). Likewise, subcategories also reveal a wide range of counts, with only 10% of the repair typologies surpassing 10,000 MOs.

For the mobility analysis, only trips data from 2022 was used to minimize the impact of the COVID-19 pandemic and to reflect the increasing use of e-bikes, as detailed in Appendix A. In contrast, the maintenance analysis and the PM modeling utilized the complete datasets available for both mobility and maintenance.

3.3 Trips processing

To study mobility patterns and develop the PM strategy, only trips with durations ranging from 2 to 60 minutes were taken into account, which constituted 99% of all trips. Once filtered, various trip metrics were collected for the mobility analysis. While trip duration could be directly derived from the trips data, other measures required some additional steps. Obtaining the trip routes was not feasible since data only provided the starting and ending stations for each trip. As a consequence, the distances between stations were obtained using the OpenRouteService API [57], which can suggest recommended routes for bicycles based on OpenStreetMap data [58]. These routes are generated by combining the suitability of streets for the chosen mode of transportation along with the fastest route option. Then, using the trips’ duration and distance their average speed was obtained. Furthermore, the cumulative trip inclinations were simplified by calculating the difference between the altitudes of the origin and destination stations. The specific station altitudes were obtained using the OpenTopoData API [59].

To determine whether the trip characteristics of m-bikes and e-bikes come from the same distribution, 5000 samples from each subgroup were randomly selected. Initially, a Shapiro-Wilk test was applied to both samples to assess whether they followed a normal distribution. Subsequently, if both samples were found to be normally distributed, an Independent Sample T-Test was applied for comparison. In cases where one or both samples did not meet the normality assumption, the Kolmogorov-Smirnov test was used. All statistical tests were conducted with a significance level set at 0.01.

To compare trip numbers of both bike models, several metrics were computed to avoid comparisons with absolute numbers. Initially, for each bike model, at all stations, the percentage of incoming (or outgoing) trips was determined by dividing the total trips for each model by the station’s total trips. Then, two supplementary metrics were derived from these percentages: (1) the differences in the percentage of incoming (or outgoing) trips between e-bikes and m-bikes, and (2) the difference in the percentage of incoming and outgoing trips for each bike model.

3.4 MOs processing

First, the target bike parts for this study needed to be selected. To ensure complete objectivity, subjective MO types such as cleaning or greasing were excluded, and the focus was placed on the replacement of bike parts. Also, failures related to wheel tubes were excluded due to their high level of randomness.

MO data records specific dates for bike part repairs, however, this data structure is not optimal for conducting SA, which typically models the time to an event. To facilitate the use of SA, MOs were transformed into MO units. These units represent the time elapsed between two consecutive repairs for the same bike and bike component, and thus, it is the time period in which the bike part under study is operational. Consequently, when working with MO units, bike part survival information at the beginning and end of the data set could be lost for each bike. To prevent this information loss, the time from the start of the data set to the first MO is considered as one MO unit, and the time from the last repair of the bike to the end of the data set is regarded as another MO unit. It’s worth noting that even though MO units may originate from the same bike, each one has been treated as an independent entity.

One key challenge in predicting the occurrence of an event is the presence of censored data, which refers to incomplete information about survival times [45, 46]. Distinct types of censoring exist, including right-censoring (the event has not occurred by the end of the study), left-censoring (the event occurred before the study started), and interval-censoring (the event occurred within a specific time interval). Bicing maintenance data contains left- and right-censored data, since the first and last MO units of each bike and repair typology are incomplete. In the first unit, it is not possible to know when this bike part started working, and in the last, when this bike part finally will break. Since SA can effectively utilize uncensored and right-censored data to estimate the survival curves and generate predictions, only the left-censored MO units were discarded in the training and prediction phases. Specifically, this involves excluding the initial MO unit of each bike for the target bike part. Additionally, units without trips and the ones in which m-bikes were upgraded to e-bike were also discarded.

The data sets used for the survival models follow a format where each row encapsulates all the pertinent information about one subject. Within each entry, there are the details about the subject’s survival duration, event occurrence, and aggregated covariate values crucial for the survival function estimation. MO units, with their defined start and end dates, facilitate the calculation of covariates that describe the utilization patterns of a bike part and its surrounding environmental factors. In this way, various covariates domains have been incorporated to the model inputs:

Weather: weather data was analyzed to incorporate the surrounding environmental conditions of the MO unit. This involved considering the daily average temperature (in ºC), total daily precipitation (in mm), average wind direction (in degrees), mean wind speed (in km/h), and average atmospheric pressure (in hPa). After exploring this data and its connection to bike part failures, it was determined that the primary influential factors were the daily average temperature and the mean atmospheric pressure.
Bike usage: various metrics were computed, including daily distance traveled (in meters), daily positive and negative inclinations (in meters), and the mean daily speed (in km/h). These calculations were based on the previously collected trip routes and inclinations between each couple of stations. Next, their cumulative versions were examined, and strong correlations were identified. As a result, only two variables were kept: the cumulative daily distance and the mean speed of the MO unit.
Bike model: due to potential variations in survival curves across the two bicycle models, a binary variable was introduced to distinguish between electric (1) and mechanical (0) bike models.
Count of repairs for other bike parts during the target MO unit: when analyzing a specific bike part, it becomes pertinent to evaluate the frequency of replacements for another bike part. This approach allows to gain valuable insights; for instance, understanding the number of wheel tubes replaced could aid in predicting potential replacements for components such as tires or wheel rims. Following an examination of the correlation and the variation inflation factor of these repair counts, the following MO subcategories were selected: brake tension adjustment, the replacement of the front and rear wheel tubes, and the replacement of the front wheel cover.

3.5 Models

To predict the survival time of the bike components, the following statistical and machine learning models have been employed:

1.
Cox Proportional Hazard model (CPH) [44] (lifelines’ implementation [60]): is a semi-parametric approach that enables to evaluate how covariates influence the hazard rate of an event as time progresses. This model relies on an underlying assumption known as the proportional hazard assumption, which asserts that the relative risk between two distinct groups remains constant over time. Meeting this assumption simplifies the analytical process and enhances the interpretability of outcomes. Although evaluating the assumption holds theoretical importance for attaining a meaningful interpretation of covariates, adherence to this assumption might not be imperative. [61] noted that when dealing with a sufficiently large sample size, even minor deviations from the assumption may show up. Furthermore, when the main goal is survival prediction, there is no need to test the proportional hazard assumption, since the main objective is to maximize an score [62]. Consequently, for the purpose of this study, the analysis will prioritize predictive accuracy over strict adherence to the assumption.
2.
Multi-Task Logistic Regression model (MTLR) [63] (pysurvival’s implementation [64]): stands as an alternative to CPH when the assumption of proportional hazards does not hold. MTLR model relies on a sequence of logistic regression models constructed across distinct time intervals. This allows the estimation of the probability associated with the occurrence of the event of interest within each interval. Consequently, the initial step involves specifying the desired number of intervals, with the present use case opting for the number of days in the maintenance data.
3.
Conditional Survival Forest model (CSF) [65] (pysurvival’s implementation [64]): is a machine learning model that extends Random Forest ensembles to effectively handle right-censored data. Therefore, this survival model allows to appropriately model data with non-linear relationships and censoring.
4.
CPH Deep Neural network model (DeepSurv) [66] (pysurvival’s implementation [64]): is an improved version of the CPH model that brings in elements of deep learning to its core structure. This enhancement enables the model to better capture complex patterns while still being able to handle censored data effectively.

3.6 Hyper-parameters optimization

The previous models possess distinct architectures. Consequently, the hyper-parameter optimization process has varied based on the specific model.

For the purpose of identifying the optimal hyper-parameters, the SA dataset of each bike component was partitioned into three subsets: training (60%), validation (20%), and test (20%). Multiple hyper-parameter configurations were trained on the training set and assessed with the validation set and the RMSE metric to identify the most favorable combination. Once determined, the model was trained using both the training and validation sets, and predictions based on the test set were generated to assess the final accuracy of the model. Furthermore, although model validation was primarily assessed using the RMSE metric, all data subsets were also evaluated for RMSE, \(R^{2}\) and MAPE to assess accuracy comprehensively (Appendix D). RMSE and MAPE interpretation is very straightforward; the lower, the better. However \(R^{2}\) works in the contrary direction since higher values, correspond to better predictions.

For the CPH model, the sole hyper-parameter subjected to optimization was the baseline estimation method (breslow, spline, or piecewise), while the remaining parameters were set to their default values. In the case of MTLR, CSF, and DeepSurv, the hyper-parameter search process was automated using the Optuna framework [67]. A total of 200 trials were conducted for each model, employing the TPEsampler class for hyper-parameter sampling and the MedianPruner class to halt unpromising combinations. For MTLR, the optimized hyper-parameters encompassed the learning rate (ranging between 1e-5 and 1e-3), initialization method (orthogonal or glotorot_uniform), and optimizer (adam, adamaz, or sgd). The CSF optimization encompassed the number of trees (ranging between 10 and 100 in increments of 10), maximum tree depth (between 2 and 10), and minimum node size (ranging between 10 and 50 in steps of 5). In the case of DeepSurv, the optimization process considered initialization method (orthogonal or glotorot_uniform), optimizer (sgd or adam), learning rate (ranging between 1e-5 and 1e-2), number of epochs (ranging between 50 and 500), L2 regularization (ranging between 0 and 1e-2), and the inclusion of batch normalization or dropout with a value of 0.5 (True or False for each one).

4 Results

4.1 Mobility patterns analysis

The analysis of mobility patterns focused on data of 2022. Both mechanical and electric transportation modes exhibited variability across time due to seasonal variations and holidays such as Easter week and summer holidays (Fig. 1A). However, a noticeable differential trend emerged from summer onwards. Electric mobility experienced a rise in trip numbers while the mechanical one suffered a significant decrease. Despite some m-bikes being upgraded to e-bikes in the course of 2022, the increase in the number of e-bike rides couldn’t be entirely attributed to this transformation, considering they comprised only 40% of the bike fleet at their highest point. E-bikes experienced a rise in the mean daily trips per bike, whereas m-bikes saw a significant reduction. Consequently, the variations in the usage of both transportation modes can be attributed to the upgrade of m-bikes into e-bikes and the increasing preference of users for the electric mobility.

Trip counts present a strong weekly seasonality characterized by two distinct mainly daily patterns: weekdays and weekends (Fig. 1B). On weekdays, three important trip count maxima are observed at 8:00, 14:00, and 18:00, with a gradual increase in trip numbers as the week progresses, peaking on Thursday. Fridays slightly deviate from the typical weekday pattern since they present similar trip count peaks at 14:00 and 18:00. In contrast, weekends are characterized by a 30% mobility reduction, the absence of an 8:00 peak, and a shift in the 18:00 peak to 19:00. Notably, electric mobility consistently matches or surpasses the mechanical one across all days and hours despite its additional cost. Both transportation modes also differ in their trip characteristics (Fig. 1C-F). Significant differences were found in the distributions of trip duration, distance, speed, and elevation (Sect. 3.3), implying two distinct behaviors. Specifically, electric mobility is characterized by steeper inclines, longer durations, greater distances from the origin, and higher speeds compared to mechanical mobility.

Additionally, each mobility mode exhibits unique spatial mobility dynamics (Fig. 2). High-altitude stations predominantly receive e-bike trips, whereas low-altitude stations see the majority of incoming trips via m-bikes. This reflects users’ preference for electric mobility when doing physically demanding trips. While this pattern is clear for incoming trips, it becomes less pronounced for outgoing trips, since the proportional utilization of e-bikes is not as dominant, even at high-altitude stations. This is attributed to the fact that the outgoing trips from a station do not indicate their destinations, resulting in a mix of trips to lower and higher stations. Nevertheless, given the overall preference for electric mobility among users, the inclination towards e-bikes still remains evident. Furthermore, when analyzing the differences in the percentages of incoming and outgoing trips involving e-bikes (Appendix B), it becomes evident that the prevalence of incoming e-bike trips is more pronounced at high-altitude stations. In contrast, at lower-altitude stations, the percentages of incoming trips made by e-bikes become less substantial compared to the outgoing trips.

By segregating incoming and outgoing trips based on their elevation, this phenomenon becomes even more evident (Fig. 2). At stations that generate outgoing trips with an elevation gain of over 100 meters, electric mobility completely dominates the transportation mode. Consequently, the receiving stations at higher elevations predominantly receive e-bike trips. This prevalence of electric mobility reduces gradually as the elevation decreases until reaching those trips with elevations between −50 and 50 meters. In this range of elevations, mechanical mobility plays a much more prominent role, especially at low altitude stations, however, even at high-altitude stations, electric mobility remains relevant in both incoming and outgoing trips. Notably, even in scenarios with negative elevations exceeding −50 meters, electric mobility continue to be the preferred transport mode choice, albeit with reduced percentages. This phenomenon may be attributed to users’ preference for an electric transportation mode on uphill rides, resulting in more e-bikes accumulating at higher altitudes, and to potential elevation irregularities in paths between high-altitude stations.

4.2 Maintenance operations analysis

Before generating survival predictions for bike components, an analysis of the failure patterns within the maintenance dataset was conducted. Given the low frequencies of most MO types (Appendix A) and the criteria outlined in Sect. 3.4, only a few subcategories were deemed suitable for developing prediction models. Ultimately, three repair types were identified as the final targets for the predictions: brake pads, wheel spokes, and chains. Selected bike parts data was converted into MO units (Table 1, Appendix C), and their corresponding covariate values were integrated as described in Sect. 3.4.

Table 1 MO units: counts and percentages for the three chosen bike parts

Full size table

The analysis of failure dynamics for the three bike parts revealed distinct patterns between m-bikes and e-bikes (Fig. 3). Specifically, for brake pads and wheel spokes, e-bikes generally exhibit a significantly higher number of repairs per bike compared to their mechanical counterparts. Interestingly, m-bikes display very few wheel spoke repairs. Also, notable differences in the survival times of the three bike parts have been found. E-bike brake pads have shorter survival periods compared to their mechanical counterparts, and wheel spokes display similar trends. However, the scarcity of MOs for m-bikes makes direct comparisons of survival distributions impractical. Finally, chain repairs present the widest range of survival times for both m-bikes and e-bikes, with m-bike chains having longer survival times when compared to e-bike chains.

When exploring the potential relation with bike usage, cumulative distance emerged as the primary covariate that significantly distinguishes between bike models (Fig. 3). M-bike brake pads demonstrate considerably longer durability compared to their e-bike counterparts when covering equivalent distances, while in the case of chains, the behavior is the opposite. Furthermore, although it appears that the limited number of uncensored m-bike MO units for wheel spokes outlast those of the e-bikes, the scarcity of the former prevents a meaningful comparison of their behaviors. Also, differences have been observed in terms of the average number of reparations for other MO subcategories during the target MO units (Appendix C). Thus, m-bikes and e-bikes bike parts exhibit distinct breakdown dynamics, which can be attributed to variations in how bikes are used based on their specific model.

Before model training, MO units distributions according to censoring and bike model have been examined (Fig. 4, Table 1). Uncensored data for both brake pads and wheel spokes show similar trends, representing in both cases approximately 84% of the dataset. 90% of the uncensored units have survival times of up to 200 days for brake pads and 150 days for wheel spokes, with some units lasting as long as approximately 800 days. However, there are two significant differences between both. First, the ratio of right-censored to uncensored units for brake pads remains fairly consistent until the 550-day mark. In contrast, for wheel spokes, the proportion of uncensored units steadily declines, virtually disappearing after 650 days. Secondly, uncensored units for m-bikes become the majority for brake pads after 200 days, whereas there is an almost complete absence of such units for m-bike wheel spokes. The decreasing number of MO units over time and the reduction in uncensored units suggest that modeling long-lasting MO units for both bike components could pose challenges. Moreover, the specific scarcity of uncensored MO units for m-bike wheel spokes in brake pads could introduce additional complexities in the modeling.

Chain maintenance exhibits distinct patterns. The number of censored units for chains exceeds that of the uncensored units, indicating that chains generally have longer lifespans compared to the other two bike components. Moreover, the failure rate for chains is notably different; only 30% of chains fail before reaching the 200-day mark. Given the robust nature of chains, the proportion of uncensored units is low at the start of the timeline. Additionally, due to the rarity of instances surviving for extended periods, this proportion remains low past the 500-day mark as well. Considering the lower overall number of MO units for chains, coupled with the relatively smaller number of uncensored units and their reduced proportion at both the beginning and end of the timeline, modeling the longevity of chains could be more challenging compared to other parts.

4.3 Predictions accuracy

Upon confirming data distributions in the training and test sets are aligned (Appendix E), survival models were trained and their performance was evaluated predicting the uncensored MO units from both datasets (Appendix E, Table 2). The decision to exclude the right-censored data was made to prevent comparisons of predictions with the time duration of MO units that do not conclude with a repair.

Table 2 Predictions accuracy metrics on the test dataset. Forecasts were generated using exclusively the uncensored MO units

Full size table

As expected, across nearly all models, predictions made on a combination of the training and validation sets exhibit higher accuracy than those generated on the test set. This discrepancy arises because the training and the hyper-parameter optimization process has occurred on the training and validation sets, while the test set is strictly reserved for evaluating the final performance. Moreover, the close similarity between both sets’ accuracies implies that the hyper-parameter optimization has been successful enough to grant the capacity for prediction generalization.

Classical CPH models have exhibited the poorest accuracy. This lesser performance is likely due to their reliance on linear equations, which is inadequate for capturing the non-linear failure dynamics of the bike components. To address this issue, MTLR and CSF models were employed, demonstrating in both cases higher accuracies. However, an exception was noted with the CSF model’s performance for chains. This outcome arises from the fact that CSF models were unable to generate predictions within the entire range of the training data, and thus, they were unable to adequately learn from the training set. In contrast, MTLR models clearly outperformed CPH models, indicating that the adapted logistic regressions for survival data are more effective than CPH models.

DeepSurv models were utilized to improve the modeling of non-linear data, achieving the best results. Compared to MTLR models, DeepSurv models demonstrated a reduction in RMSE values by 25 to 33%. Also, they attained MAPE metrics below 33% and R² values exceeding 0.92. However, it is important to note that the accuracy of predictions for chains remains lower than those for brake pads and wheel spokes. This disparity might stem from two previously mentioned factors: the relatively small number of chain MO units and the limited availability of uncensored data for MO units with survival periods of either less than 200 days or more than 600 days.

4.4 Predictions analysis

In the PM field, predictions must be as close as possible to the failure date and, ideally, these predictions should pertain to dates preceding the failure events, as it facilitates preventive maintenance and avoids the costs associated with late component replacements. In this line, an exploration of the predictions has been performed by considering right-censored and uncensored units separately.

Uncensored data predictions present a high level of accuracy. This assertion is further substantiated by a comparative analysis of the mean and standard deviations pertaining to the actual and predicted lifespans (Table 3). However, right-censored MO units exhibit more substantial deviations in these statistics. Also, as previously stated, chain predictions exhibit the highest predictive errors, which can be attributed to its lower number of MO units and the significantly higher proportion of right-censored units.

Table 3 Actual and predicted survival times descriptive table. The means and standard deviations (in parenthesis) for the actual and predicted survival times (in days) are displayed

Full size table

Uncensored MO units exhibit strong lineal relationships between the predicted and actual survival times, as evidenced by Pearson correlation coefficients of 0.97 for brake pads, 0.97 for wheel spokes, and 0.96 for chains (Fig. 5). Similarly, the predicted-to-actual survival time ratio for the three bike components is notably accurate (Table 4), with a mean centered around 1, and approximately 80% of predictions exhibiting an error of less than ±25% compared to the actual values. In terms of the absolute difference, in the brake pads and wheel spokes 80% of the MO units have an error below 20 units, while for chains, it’s 50%. Therefore, it can be confidently concluded that the predictions are remarkably close to the actual failure dates.

Table 4 Descriptive statistics for predicted-actual value metrics of uncensored MO units

Full size table

Unlike the uncensored units, which provide data on when bike parts fail, the right-censored units lack information regarding the specific timing of these failures. For this reason, the predicted days are expected to have higher values than the actual days, and therefore the predicted/actual ratio values should tend values higher than one. In this case, indeed, the majority of predictions generated for the right-censored MO units were higher than the corresponding actual values (Fig. 5).

4.5 Models interpretability

To assess how model inputs might affect the chances of models predicting a reparation, the game theoretic approach SHapley Additive exPlanations (SHAP) has been applied to the DeepSurv models of the three bike parts (Fig. 6). Additionally, for a comprehensive view of the average effect of each input feature on the models’ output, partial dependence plots are provided in Appendix F. Overall, the most influential features are the ones related to the bike usage and the bike model.

MO units cumulative distance emerged as the most influential factor in predicting the lifespans of the three bike parts. Larger distances are strongly associated with longer predicted survival times. This connection might seem counterintuitive, as one would typically expect higher distances to lead to faster breakdowns. However, because the models’ aim is to predict survival times, direct survival information is not included as an input. Consequently, due to the strong correlation between survival times and cumulative distances, cumulative distance is used as a baseline for predicting the lifespan of bike parts.

Average mean speed has also emerged as one of the most critical features for all bike parts. Higher values are closely associated with shorter survival times, indicating that increased stress on the bike parts leads to a reduction in their lifespan.

In Figs. 1 and 2, it was noted that the bike model plays an important role in BSS mobility dynamics. As a result, the bike model emerges as the second most impactful feature. Specifically, electric mobility is consistently linked with shorter survival times. However, it is worth noting that the impact of the bike model on wheel spokes is relatively less pronounced, which can be attributed to the fewer number of MO units for m-bikes.

While weather-related variables have a lesser impact, they remain significant. Higher mean daily temperatures and mean daily atmospheric pressure values were associated with reduced survival times for brake pads and wheel spokes. In essence, hotter and drier weather conditions tended to decrease the survival time of these bike components. However, this effect is less evident in the case of chains. This discrepancy can be attributed to their extended survival times, leading to a wider range of weather conditions being encompassed within the average values. Finally, the counts of repairs for other MOs have been found to be the variables with least impact in the predictions.

5 Discussion

This study offers a novel perspective in BSS research by analyzing mobility patterns, differentiating between m-bikes and e-bikes as separate modes of transportation. Our findings reveal that the Bicing BSS exhibits mobility dynamics similar to those in other BSS studies in terms of distance travelled, trip duration, and bike speed [12–17]. Notably, electric mobility has been found to be the preferred mode of transportation, characterized by longer trip durations, farther distances covered, and faster speeds when compared to mechanical mobility. Moreover, the trend of lower trip counts on weekends compared to weekdays, as noted in [38, 40], still holds true, along with the occurrence of three daily peaks [38].

Aligned with the findings of [32, 33], our study further confirms that topography plays a significant role in BSS utilization. Electric mobility is primarily chosen for trips involving steep positive inclines, whereas mechanical mobility is preferred for routes with slight elevation changes, especially for journeys originating or ending at stations situated at lower elevations. Consequently, stations at higher altitudes predominantly receive e-bikes, whereas those at lower altitudes are more likely to attract the mechanical ones. Moreover, this pattern is even more pronounced when considering the elevation differences between the origin and ending stations. Interestingly, even in scenarios with significant negative elevations, where physical exertion is low, electric mobility remains the dominant choice. This preference may be attributed to the accumulation of e-bikes at higher altitudes and the irregularities in the paths connecting high-altitude stations.

Factors related to mobility, such as trip distance, not only differentiate electric from mechanical mobility but were also found to significantly influence the wear and tear on bike components. Leveraging these insights, in this paper we present a novel PM system for a BSS, which marks a significant step towards replacing corrective maintenance strategies. The system has been designed to predict maintenance needs for three essential bike components, delivering satisfactory forecasting results through the application of deep learning survival models. The design we introduce enables operation across an entire bike fleet, distinguishing it from previous research in bike PM that primarily focused on individual bikes [54, 56]. Moreover, our system enhances upon key aspects of an earlier PM solution developed for Oslo’s BSS [55]. First, our datasets are considerably more comprehensive. Second, we treat different repair typologies independently, acknowledging their distinct breakdown dynamics. Lastly, to prevent potentially discriminatory conclusions that could impact BSS pricing strategies, we have deliberately omitted user information such as gender and age from our modelling. Additionally, through the application of a game-theoretic interpretability approach, we verify that the model’s predictions are consistent with the observed failure dynamics. Notably, the most influential factors in the generation of the predictions are the cumulative distance and whether the bike is mechanical or electric.

Both the mobility and the PM results, along with their potential implications, hold significant promise for impacting society, especially in terms of enhancing the sustainability of urban mobility. Understanding BSS users’ preferences for mechanical and electric mobility is essential for improving the decision-support systems that facilitate fleet rebalancing. Enhancing these systems not only assists BSS managers but also promotes BSS mobility by improving the user’s experience, ensuring the availability of the preferred mode of transport when needed. On the other hand, the PM system we presented, which can predict the breakdowns of three key bike components, marks an initial step towards developing a more comprehensive system that considers more bike components. Implementing such holistic systems could significantly enhance urban mobility sustainability by reducing the environmental footprint and operational costs of BSS through more efficient resource utilization. Beyond improving the scheduling of maintenance activities, these systems also facilitate the application of bike rebalancing strategies to extend the lifetime of bike parts. For instance, by strategically relocating bikes to less demanding routes or stations, we can minimize stress on critical components, thereby prolonging their usability when needed. Furthermore, the PM systems would enhance the user’s experience by ensuring a more reliable and available bike fleet. However, while this study demonstrates the feasibility of successfully deploying a PM system for BSS, employing interpretability tools to build trust in the accuracy of these systems among BSS managers is crucial for their broader adoption.

Despite the high quality of our data, certain limitations may have impacted our findings. The absence of specific bike routes required us to infer trip distances, potentially affecting the analysis of mobility patterns and the PM modeling. Similarly, elevation gains were simplified, potentially overlooking elevation irregularities between stations. Furthermore, maintenance records, which reflect when bike components were replaced rather than when they failed, along with datasets affected by censoring-related imbalances, could have skewed the predictive accuracy of the survival models. Finally, building on the findings of this study, future research could use the mobility insights gathered to enhance bike rebalancing strategies for BSSs that include both m- and e-bikes. This could potentially boost efficiency and user satisfaction by optimizing bike distribution based on detailed usage patterns. Additionally, while this research focused on three critical bike components, future studies could extend to include more components, developing a more comprehensive PM system. Lastly, a crucial step to validate this approach would involve the real-world deployment of these models, providing practical evidence of their impacts.

Availability of data and materials

The data that support the findings of this study are available from Pedalem-Bicing but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.

Abbreviations

BSS:: bike-sharing system
e-bike:: electric bike
m-bike:: mechanical bike
MO:: maintenance operations
ARMA:: auto-regressive moving average
PM:: predictive maintenance
SA:: survival analysis
CPH:: Cox proportional hazard model
MTLR:: multi-task logistic regression model
CSF:: conditional survival forest model
DeepSurv:: CPH deep neural network model
SHAP:: shapley additive explanations
RMSE:: root mean square error
MAPE:: mean absolute percentage Error
\(R^{2}\) :: determination coefficient

References

Ma X, Yuan Y, Van Oort N, Hoogendoorn S (2020) Bike-sharing systems’ impact on modal shift: a case study in delft, the Netherlands. J Clean Prod 259:120846. https://doi.org/10.1016/j.jclepro.2020.120846
Article Google Scholar
Qiu L-Y, He L-Y (2018) Bike sharing and the economy, the environment, and health-related externalities. Sustainability 10(4):1145. https://doi.org/10.3390/su10041145
Article Google Scholar
Map MB-SW (2022) The Meddin bike-sharing world map report. https://bikesharingworldmap.com/reports/bswm_mid2022report.pdf. Accessed 03-10-2023
Liu J, Li Q, Qu M, Chen W, Yang J, Xiong H, Zhong H, Fu Y (2015) Station site optimization in bike sharing systems. In: 2015 IEEE international conference on data mining, pp 883–888. https://doi.org/10.1109/ICDM.2015.99
Chapter Google Scholar
Duran-Rodas D, Villeneuve D, Pereira FC, Wulfhorst G (2020) How fair is the allocation of bike-sharing infrastructure? Framework for a qualitative and quantitative spatial fairness assessment. Transp Res, Part A, Policy Pract 140:299–319
Article Google Scholar
De Chardon CM, Caruso G, Thomas I (2016) Bike-share rebalancing strategies, patterns, and purpose. J Transp Geogr 55:22–39
Article Google Scholar
Ruben Talavera-Garcia GR, Arias-Molinares D (2021) Examining spatio-temporal mobility patterns of bike-sharing systems: the case of bicimad (Madrid). J Maps 17(1):7–13. https://doi.org/10.1080/17445647.2020.1866697
Article Google Scholar
Kon F, Ferreira ÉC, Souza HA, Duarte F, Santi P, Ratti C (2021) Abstracting mobility flows from bike-sharing systems. Public Transp 14(3):545–581. https://doi.org/10.1007/s12469-020-00259-5
Article Google Scholar
Shaheen S, Guzman S, Zhang H (2010) Bikesharing in Europe, the Americas, and Asia: past, present, and future. In: Institute of transportation studies, UC Davis, institute of transportation studies. Working paper series, vol 2143. https://doi.org/10.3141/2143-20
Chapter Google Scholar
Cho DI, Parlar M (1991) A survey of maintenance models for multi-unit systems. Eur J Oper Res 51(1):1–23. https://doi.org/10.1016/0377-2217(91)90141-H
Article Google Scholar
Reinertsen R (1996) Residual life of technical systems; diagnosis, prediction and life extension. Reliab Eng Syst Saf 54:23–34
Article Google Scholar
Jensen P, Rouquier J-B, Ovtracht N, Robardet C (2010) Characterizing the speed and paths of shared bicycle use in lyon. Transp Res, Part D, Transp Environ 15(8):522–524. https://doi.org/10.1016/j.trd.2010.07.002
Article Google Scholar
Ciancia V, Latella D, Massink M, Pakauskas R (2015) Exploring spatio-temporal properties of bike-sharing systems. In: 2015 IEEE international conference on self-adaptive and self-organizing systems workshops, pp 74–79. https://doi.org/10.1109/SASOW.2015.17
Chapter Google Scholar
Zaltz Austwick M, O’Brien O, Strano E, Viana M (2013) The structure of spatial networks and communities in bicycle sharing systems. PLoS ONE 8(9):74685. https://doi.org/10.1371/journal.pone.0074685
Article Google Scholar
Chiariotti F, Pielli C, Zanella A, Zorzi M (2018) A dynamic approach to rebalancing bike-sharing systems. Sensors 18(2):512. https://doi.org/10.3390/s18020512
Article Google Scholar
Borgnat P, Abry P, Flandrin P, Robardet C, Rouquier J-B, Fleury E (2011) Shared bicycles in a city: a signal processing and data analysis perspective. Adv Complex Syst 14:415–438
Article Google Scholar
Zhang J, Pan X, Li M, Yu PS (2016) Bicycle-sharing system analysis and trip prediction. In: 2016 17th IEEE international conference on Mobile Data Management (MDM), vol 1, pp 174–179. https://doi.org/10.1109/MDM.2016.35
Chapter Google Scholar
Oliveira GN, Sotomayor JL, Torchelsen RP, Silva CT, Comba JLD (2016) Visual analysis of bike-sharing systems. Comput Graph 60:119–129. https://doi.org/10.1016/j.cag.2016.08.005
Article Google Scholar
Zhu S, Levinson D (2015) Do people use the shortest path? An empirical test of wardrop’s first principle. PLoS ONE 10(8):0134322
Article Google Scholar
Lu W, Scott DM, Dalumpines R (2018) Understanding bike share cyclist route choice using gps data: comparing dominant routes and shortest paths. J Transp Geogr 71:172–181
Article Google Scholar
Klein S, Brondeel R, Chaix B, Klein O, Thierry B, Kestens Y, Gerber P, Perchoux C (2023) What triggers selective daily mobility among older adults? A study comparing trip and environmental characteristics between observed path and shortest path. Health Place 79:102730. https://doi.org/10.1016/j.healthplace.2021.102730
Article Google Scholar
Zhao J, Deng W, Song Y (2014) Ridership and effectiveness of bikesharing: the effects of urban features and system characteristics on daily use and turnover rate of public bikes in China. Transp Policy 35:253–264. https://doi.org/10.1016/j.tranpol.2014.06.008
Article Google Scholar
Duran-Rodas D, Chaniotakis E, Antoniou C (2019) Built environment factors affecting bike sharing ridership: data-driven approach for multiple cities. Transp Res Rec 2673(12):55–68
Article Google Scholar
Sarkar A, Lathia N, Mascolo C (2015) Comparing cities’ cycling patterns using online shared bicycle maps. Transportation 42:541–559
Article Google Scholar
Moncayo-Martínez LA, Ramirez-Nafarrate A (2016) Visualization of the mobility patterns in the bike-sharing transport systems in Mexico city. In: 2016 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), pp 1851–1855. https://doi.org/10.1109/IEEM.2016.7798198
Chapter Google Scholar
Faghih-Imani A, Eluru N, El-Geneidy AM, Rabbat M, Haq U (2014) How land-use and urban form impact bicycle flows: evidence from the bicycle-sharing system (bixi) in Montreal. J Transp Geogr 41:306–314
Article Google Scholar
Wang X, Lindsey G, Schoner JE, Harrison A (2016) Modeling bike share station activity: effects of nearby businesses and jobs on trips to and from stations. J Urban Plann Dev 142(1):04015001
Article Google Scholar
Faghih-Imani A, Hampshire R, Marla L, Eluru N (2017) An empirical analysis of bike sharing usage and rebalancing: evidence from Barcelona and Seville. Transp Res, Part A, Policy Pract 97:177–191
Article Google Scholar
O’Brien O, Cheshire J, Batty M (2014) Mining bicycle sharing data for generating insights into sustainable transport systems. J Transp Geogr 34:262–273
Article Google Scholar
Corcoran J, Li T, Rohde D, Charles-Edwards E, Mateo-Babiano D (2014) Spatio-temporal patterns of a public bicycle sharing program: the effect of weather and calendar events. J Transp Geogr 41:292–305. https://doi.org/10.1016/j.jtrangeo.2014.09.003
Article Google Scholar
Yang Y, Heppenstall A, Turner A, Comber A (2019) A spatiotemporal and graph-based analysis of dockless bike sharing patterns to understand urban flows over the last mile. Comput Environ Urban Syst 77:101361. https://doi.org/10.1016/j.compenvurbsys.2019.101361
Article Google Scholar
Morency C (2015) Modelling bikesharing usage in montreal over 6 years. https://api.semanticscholar.org/CorpusID:128434352
Kim I, Pelechrinis K (2020) The anatomy of the daily usage of bike sharing systems: elevation, distance and seasonality. https://api.semanticscholar.org/CorpusID:231580822
Noussan M, Carioni G, Sanvito FD, Colombo E (2019) Urban mobility demand profiles: time series for cars and bike-sharing use as a resource for transport and energy modeling. Data 4(3):108. https://doi.org/10.3390/data4030108
Article Google Scholar
McKenzie G (2019) Spatiotemporal comparative analysis of scooter-share and bike-share usage patterns in Washington, D.C. J Transp Geogr 78:19–28. https://doi.org/10.1016/j.jtrangeo.2019.05.007
Article Google Scholar
Almannaa MH, Ashqar HI, Elhenawy M, Masoud M, Rakotonirainy A, Rakha H (2020) A comparative analysis of e-scooter and e-bike usage patterns: findings from the city of Austin, TX. Int J Sustain Transp 15(7):571–579. https://doi.org/10.1080/15568318.2020.1833117
Article Google Scholar
Froehlich J, Neumann J, Oliver N (2008) Measuring the pulse of the city through shared bicycle programs. Proc of UrbanSense08: 16–20
Froehlich JE, Neumann J, Oliver N (2009) Sensing and predicting the pulse of the city through shared bicycling. In: Twenty-first international joint conference on artificial intelligence
Google Scholar
Kaltenbrunner A, Meza R, Grivolla J, Codina J, Banchs R (2010) Urban cycles and mobility patterns: exploring and predicting trends in a bicycle-based public transport system. Pervasive Mob Comput 6(4):455–466
Article Google Scholar
Bustamante X, Federo R, Fernández-i-Marin X (2022) Riding the wave: predicting the use of the bike-sharing system in Barcelona before and during covid-19. Sustain Cities Soc 83:103929. https://doi.org/10.1016/j.scs.2022.103929
Article Google Scholar
Ran Y, Zhou X, Lin P, Wen Y, Deng R (2019) A survey of predictive maintenance: systems, purposes and approaches. arXiv:1912.07383
Wang J, Zhang L, Duan L, Gao RX (2017) A new paradigm of cloud-based predictive maintenance for intelligent manufacturing. J Intell Manuf 28(5):1125–1137
Article Google Scholar
Yang Z, Kanniainen J, Krogerus T, Emmert-Streib F (2022) Prognostic modeling of predictive maintenance with survival analysis for mobile work equipment. Sci Rep 12(1):8529. https://doi.org/10.1038/s41598-022-12572-z
Article Google Scholar
Cox DR (1972) Regression models and life-tables. J R Stat Soc B 34(2):187–202
Article MathSciNet Google Scholar
Leung K-M, Elashoff RM, Afifi AA (1997) Censoring issues in survival analysis. Annu Rev Public Health 18(1):83–104
Article Google Scholar
Gijbels I (2010) Censored data. Wiley Interdiscip Rev: Comput Stat 2(2):178–188
Article Google Scholar
Wang P, Li Y, Reddy CK (2019) Machine learning for survival analysis: a survey. ACM Comput Surv 51(6):1–36
Article Google Scholar
Reddy CK, Li Y (2015) A review of clinical prediction models. In: Healthcare data analytics. https://api.semanticscholar.org/CorpusID:263581756
Chapter Google Scholar
Modarres M, Kaminskiy MP, Krivtsov V (2016) Reliability engineering and risk analysis, 3rd edn. Taylor & Francis, Boca Raton. CRC title
Book Google Scholar
Li Y, Rakesh V, Reddy CK (2016) Project success prediction in crowdfunding environments. In: Proceedings of the ninth ACM international conference on web search and data mining. ACM, New York
Google Scholar
Ameri S, Fard MJ, Chinnam RB, Reddy CK (2016) Survival analysis based framework for early prediction of student dropouts. Association for Computing Machinery, New York. https://doi.org/10.1145/2983323.2983351
Book Google Scholar
Furrer O (2002) Driving customer equity: how customer lifetime value is reshaping corporate strategy. Int J Serv Ind Manag 13(1):107–111. https://doi.org/10.1108/ijsim.2002.13.1.107.1
Article Google Scholar
Kiefer N (1988) Economic duration data and hazard functions. J Econ Lit 26(2):646–679
Google Scholar
Rumble Mountain bike predictive analytics from your smartphone | UC Berkeley School of Information. https://www.ischool.berkeley.edu/projects/2019/rumble-mountain-bike-predictive-analytics-your-smartphone. Accessed: 2023-11-21
Predictive maintenance of bicycles | data & science. https://gregoirejan.github.io/project/maintenancebike/. Accessed 2023-11-21
Matkovic V, Waltereit M, Weis T (2021) Towards predictive safety maintenance for iot equipped bikes. In: 2021 IEEE international conference on pervasive computing and communications workshops and other affiliated events (PerCom workshops), pp 320–323. https://doi.org/10.1109/PerComWorkshops51409.2021.9430996
Chapter Google Scholar
GIScience Research Group and HeiGIT (2023) OpenRouteService. GitHub. Accessed 2023-11-13
OpenStreetMap contributors (2017) Planet dump retrieved from https://planet.osm.org. https://www.openstreetmap.org
Nisbet A (2023) OpenTopoData. GitHub. https://github.com/ajnisbet/opentopodata Accessed 2023-11-13
Davidson-Pilon C, Kalderstam J, Jacobson N, Reed S, Kuhn B, Zivich P, Williamson M, Abdeali JK, Datta D, Fiore-Gartland A, Parij A, Wilson D, Gabriel ML, Moncada-Torres A, Stark K, Gadgil H, Jona J, Singaravelan K, Besson L, Peña MS, Anton S, Klintberg A, Growth J, Noorbakhsh J, Begun M, Kumar R, Hussey S, Seabold S CamDavidsonPilon/lifelines: 0.26.0. https://doi.org/10.5281/zenodo.4816284
Stensrud MJ, Hernán MA (2020) Why test for proportional hazards? JAMA 323(14):1401–1402
Article Google Scholar
Lifelines documentation Do I need to care about the proportional hazard assumption? https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional. Accessed 2010-09-30
Yu C-N, Greiner R, Lin H-C, Baracos V (2011) Learning patient-specific cancer survival distributions as a sequence of dependent regressors. In: Advances in neural information processing systems, vol. 24
Google Scholar
Fotso S, et al (2019) PySurvival: open source package for survival analysis modeling. https://www.pysurvival.io/
Wright MN, Dankowski T, Ziegler A (2017) Unbiased split variable selection for random survival forests using maximally selected rank statistics. Stat Med 36(8):1272–1284
Article MathSciNet Google Scholar
Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y (2018) Deepsurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 18(1):1–12
Article Google Scholar
Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining
Google Scholar

Download references

Acknowledgements

We thank Daniel Santanach, coordinator of CATALONIA.AI programme. We thank Marco Orellana from CIDAI (Centre of Innovation for Data Tech and Artificial Intelligence). We want to thank Faustino Corchero from Barcelona de Serveis Municipals and Roger Junqueras, Irene Giménez and all collaborators from Pedalem-Bicing. We also thank Javier Bejar from UPC-IDEAI.

Funding

This work was supported through funds granted by the Government of Catalonia in the frame of the CATALONIA.AI programme. JGE is a fellow of Eurecat’s “Vicente López” PhD grant program.

Author information

Authors and Affiliations

Eurecat, Centre Tecnològic de Catalunya, Barcelona, Spain
Jordi Grau-Escolano, Aleix Bassolas & Julian Vicens
Universitat Politècnica de Catalunya, Barcelona, Spain
Jordi Grau-Escolano

Authors

Jordi Grau-Escolano
View author publications
You can also search for this author in PubMed Google Scholar
Aleix Bassolas
View author publications
You can also search for this author in PubMed Google Scholar
Julian Vicens
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JGE, AB and JV conceptualised the study. JGE developed the methodology, performed analyses and models. JGE wrote the original draft; and all authors critically discussed the results, revised the paper and approved the final manuscript.

Corresponding author

Correspondence to Jordi Grau-Escolano.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

(PDF 791 kB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Grau-Escolano, J., Bassolas, A. & Vicens, J. Cycling into the workshop: e-bike and m-bike mobility patterns for predictive maintenance in Barcelona’s bike-sharing system. EPJ Data Sci. 13, 48 (2024). https://doi.org/10.1140/epjds/s13688-024-00486-x

Download citation

Received: 13 March 2024
Accepted: 24 June 2024
Published: 11 July 2024
DOI: https://doi.org/10.1140/epjds/s13688-024-00486-x

Cycling into the workshop: e-bike and m-bike mobility patterns for predictive maintenance in Barcelona’s bike-sharing system

Abstract

1 Introduction

2 Background

2.1 BSS mobility

2.2 Predictive maintenance

3 Data and methods

3.1 Case of study

3.2 Data

3.3 Trips processing

3.4 MOs processing

3.5 Models

3.6 Hyper-parameters optimization

4 Results

4.1 Mobility patterns analysis

4.2 Maintenance operations analysis

4.3 Predictions accuracy

4.4 Predictions analysis

4.5 Models interpretability

5 Discussion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Supplementary Information

(PDF 791 kB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords