 Regular article
 Open Access
 Published:
Weak signals in the mobility landscape: car sharing in ten European cities
EPJ Data Science volume 8, Article number: 7 (2019)
Abstract
Car sharing is one the pillars of a smart transportation infrastructure, as it is expected to reduce traffic congestion, parking demands and pollution in our cities. From the point of view of demand modelling, car sharing is a weak signal in the city landscape: only a small percentage of the population uses it, and thus it is difficult to study reliably with traditional techniques such as households travel diaries. In this work, we depart from these traditional approaches and we leverage webbased, digital records about vehicle availability in 10 European cities for one of the major active car sharing operators. We discuss which sociodemographic and urban activity indicators are associated with variations in car sharing demand, which forecasting approach (among the most popular in the related literature) is better suited to predict pickup and dropoff events, and how the spatiotemporal information about vehicle availability can be used to infer how different zones in a city are used by customers. We conclude the paper by presenting a direct application of the analysis of the dataset, aimed at identifying where to locate maintenance facilities within the car sharing operation area.
Introduction
Automobile transportation has been one of the main drivers of the population growth and increasing wealth that have characterised the last two centuries [1]. Thanks to cars, people have had greater access to jobs, goods, services. However, these benefits have not come for free. The price paid for our increased mobility has been huge in terms of environmental pollution, city congestion and resulting health issues. We are now at a turning point for personal mobility systems: policy makers and citizens share the common idea that it is time to rethink the way we move. There are three main driving forces behind this personal mobility revolution: smart transportation, sharing economy, and green vehicles, all tightly intertwined. The departure from ownership mindset to usage mindset will make it possible to have significantly fewer vehicles in our cities. The implications are that we can save space (public parking space and private garage space) and use it for something with increased added value than to host idle cars for hours (a private car is used only 5% of its available time, corresponding to 72 minutes per 24 hrs [2]). This usage mindset will also allow people to rent the car size most appropriate to their daily needs, thus implementing the MobilityasaService concept. Since the average vehicle has only around 1.5 occupants [3, 4], people can refrain from buying a car able to address the extreme case of personal mobility (e.g., moving a whole family for a vacation) and instead use twoseaters, which are more suitable for everyday commuting. On occasion, they will be able to rent larger vehicles if needed. The virtuous mobility cycle is completed with the switch to electric vehicles, which allow for a drastic reduction in the carbon footprint of personal mobility.
In this framework, car sharing is emerging as one the most promising examples of MobilityasaService [5]. The general idea of car sharing is that the members of a car sharing system can pick up a shared vehicle of the car sharing fleet when they need it. Different operators may implement different pickup/dropoff policies. In stationbased systems, members can only pick up and drop off vehicles at designated locations called stations, as in the Autolib system in Paris. If the service is twoway (e.g., Zipcar, Modo), people are asked to bring back the vehicle to the station where they initially picked it up. Otherwise, the service is called oneway. Oneway services are definitely the most popular among customers thanks to the flexibility they provide. Examples of oneway car sharing are Autolib, Ha:Mo ride, CITIZ. Oneway services can drop altogether the concept of station: this is the case of socalled free floating car sharing—such as Car2go, DriveNow, Enjoy—whose customers can pick up and drop off vehicles anywhere within a predefined operation area.
Car sharing is a weak signal in the city landscape: the fraction of people relying on car sharing for their daily trips is rapidly increasing but it is still in the order of single digit percentage points in the best cases [6]. So far, car sharing has been mostly studied through surveys and direct interviews with its members [5, 7]. In addition, car sharing is typically not accounted for in households travel diaries periodically collected by city administrations. Even if it were, the limitations of travel surveys are widely acknowledged, and range from their inability to capture changes in the routine travel behaviour to their underestimation (because of underreporting from people) of short, noncommute trips [8]. Moreover, running a survey is very expensive if one wants to capture a statistically meaningful sample.
Cities have been considered kaleidoscopes of information since a long time [9] but the extent to which this is true has reached new heights now that a myriad of electronic devices have weaved into its fabric. From the car sharing perspective, this means that we can now know exactly when and where cars are available, and we can observe shared vehicle flows as they happen in the city. This knowledge opens up a new avenue of research that goes in the direction of the new science of cities and urban computing: using data and electronic devices to extract knowledge and to improve urban solutions. Along these lines, the goal of this paper is to stimulate a discussion on how to apply urban computing ideas to the car sharing domain. To this aim, we exploit the availability of public, webbased data about free floating car sharing in 10 European cities (whose main characteristics are summarised in Table 1 for the convenience of the reader) and we carry out an analysis with the following objective in mind: to understand what mining this kind of data can bring to cities and to car sharing operators alike. The main contributions of this study can be summarised as follows:

We perform an explanatory analysis of the car sharing demand as a function of the sociodemographic and urban fabric (i.e., number, heterogeneity, and category of Foursquare Points of Interests—PoI) indicators associated with the cities of Milan, Rome, and Turin.^{Footnote 1} While a single explanatory pattern does not emerge across the cities, they share indeed several similarities. In fact, their car sharing demand is positively associated with high educational attainment (all Italian cities) and negatively correlated with commuting outside of the municipality area (Milan, Rome). These findings confirm the conclusion of the most recent sociodemographic surveys about car sharing services [10,11,12,13], but at a much finer spatial granularity and without relying on expensive and timeconsuming interviews/questionnaires. With regards to the urban fabric indicators, the only PoI category that seems to have a statistically significant effect on car sharing demand is that of nightliferelated activities, suggesting that leisure is the most typical trip purpose.

We take into consideration several approaches to demand forecasting, and we evaluate which are the best performing when it comes to car sharing pickups/dropoffs forecasting. Our results show that Random Forest yields consistently better results than simple averagebased forecasting, time series forecasting, vanilla neural networks, and a popular custom approach proposed in the literature. However, prediction quality is in general quite good, even with the simplest solutions.

Four distinct car availability temporal patterns can be recognised in the cities considered in this study. We have labelled them day, night, neutral, and highintensity behaviours, based on when they exhibit their peak availability and on the intensity of this peak. We also show that these patterns tend to be spatially autocorrelated, i.e., neighbouring cells are likely to feature the same behaviour.

Motivated by the importance that customers place on the cleanliness of vehicles, we propose a simple approach to the effective deployment of car sharing maintenance facilities. We show that including the airport zone in the operation area and locating maintenance facilities there is a simple yet effective strategy to reduce the maintenance trips carried out by the car sharing workforce.
Related work
In the following we provide a brief overview of the most relevant works in the area of data science for car sharing, data science for transportation systems in general, and datadriven car sharing operation models.
Knowledge mining from survey data
Until recently, knowledge about car sharing systems has been mostly acquired through surveys, in which car sharing operators and members are interviewed. The main goal of these studies is to characterise the sociodemographic profile of car sharing users, as well as investigating the reasons behind their choices and the impact that car sharing has had on their mobility behaviour. In 2005, MillardBall [15] presented one of the first comprehensive sociodemographic analysis of stationbased car sharing in North America, highlighting a few key demographics indicators that will constantly reappear also in analyses of more recent car sharing solutions. After interviewing 978 US and 362 Canadian car sharing members through a webbased survey, MillardBall reports that car sharing members are typically young (25–44 year old), with high income and welleducated. They live in small households, often with no private cars. This survey does not support the finding, often presented in the related literature, that car sharing members are typically male. Recreational trips, shoppingrelated trips, and personal business trips are by far the most popular trip purpose for the respondents. In 2010, these findings are substantially confirmed by [16] for Europe, with the interesting addendum that car sharing customers tend to have season tickets for public transport more than the general population.
Considering that free floating car sharing is a recent addition to the car sharing domain (e.g. Car2go was founded in 2008, and started a significant expansion only in 2011), in the following we overview recent surveys [10,11,12,13] focusing specifically on the free floating modality. Kopp at al. [10] recruited 204 males between 25 and 45 years of age living in the cities of Munich and Berlin, Germany. 109 were free floating car sharing members (DriveNow), 95 did not use car sharing. Respondents were asked to use a custombuilt app to track their trips and to specify the trip purpose and the mode of transport. The findings of this study confirm previous results obtained for stationbased car sharing: free floating car sharing members have higher levels of education, higher income, fewer private cars, and more public transport subscriptions with respect to nonmembers. The study also highlights that car sharing members typically live in denser neighbourhoods, and are more intermodal and multimodal in their mobility behaviour. No statistically significant difference in trip purpose was detected between members and nonmembers: most trips are workhome trips (57%), leisure (19%) and shopping/errands (13%). Giesel and Nobis [12] perform a similar study for DriveNow and Flinkster users in Munich and Berlin, reporting substantially the same findings.
Becker et al. [13] directly compare free floating and stationbased car sharing members in the city of Basel, Switzerland. While the sociodemographic profile of car sharing is largely the same between stationbased and free floating and substantially the same as that pictured in the previous literature, free floating car sharing members in Basel differentiate from their stationbased counterpart in that they tend to use public transportation less. The authors remark that free floating car sharing may act as a complement to public transportation, filling the service gaps that their users might experience. The trip purposes of free floating car sharing members is quite diversified, but mostly involve visiting, shopping, and commuting, while stationbased car sharing mostly covers leisure trips, goods transport, and shopping.
Wittwer and Hubrich [11] discuss the findings from a twostage survey carried out in Hamburg, Germany, among Car2go members. The first stage of interviews took place in 2011, at the beginning of the Car2go service in the city, the second stage was run in 2016, when the service had been in place for a few years. From the sociodemographic standpoint, the 2011 and 2016 cohorts substantially share the same profile: largely man, 24–49 years old, high income, low car access, often with public transport season tickets. 2016 active users overwhelmingly rely on car sharing for leisure trips (72%), but significant percentages also use it for shopping and errands (50%) and for work/education trips (42%).
Based on the above overview, we can conclude that survey findings are consistent as far as the sociodemographic profiles of car sharing users are concerned, while contrasting results have been obtained regarding car sharing trip purpose and relationships with public transportation. In Sect. 4 we will discuss our findings in light of the above results.
Knowledge mining from digital data
The understandings and advancements brought about by the works described in Sect. 2.1 are invaluable, but the collection of survey data is expensive, time consuming, and does not scale. Typically, travel surveys cover a relatively small sample of all the trips of interest (because the number of participants as well as the observation period are typically quite limited). Furthermore, it is a wellknown problem that travel surveys often tend to underestimate the number of trips and to show a bias in the types of trips being reported [8]. For these reasons, in this work we depart from this approach and we exploit public, webbased, digital records, whose geotagged and timestamped variety of data can be analysed with data mining techniques. These data can be collected for a possibly very long time with minimal effort, and can provide geographically diverse and almost continuous measurements of the systems under study.
In the related literature, the works by Schmöller et al. [17] and Willing et al. [18] are mostly focused on the external factors that may influence car sharing demand. In particular, Schmöller et al. [17] highlight the role played by weather and demographics on the car sharing demand, while Willing et al. [18] tackle the problem of understanding if Points of Interest (PoI) in each city can be used as demand predictors. Differently from Willing et al. [18], in this work we study the effects of PoIs taking into account collinearity of predictor variables and selection bias in pvalue computation, resulting in a much smaller effect of PoIs on the car sharing demand. The same considerations apply for Schmöller et al. [17]. Our work is also close to [19], which considers freefloating car sharing in multiple cities. However, Kortum et al. [19] focus on the growth rate of free floating car sharing rather than on the characterisation from the supply side point of view. Finally, in [20], we have presented an analysis of stationbased car sharing in a single city. The analysis in [20] is more oriented to issues related to the presence of stations (their capacity, how their behaviour can be mathematically modelled using queueing theory, etc.) and suffers from the lack of vehicle identifiers in the dataset. The technique used in [20] for detecting station usage is adapted here to the free floating case, but the analysis presented here is richer, because richer is the dataset extracted from the free floating car sharing operator.
Several works in the literature also focus on the problem of demand forecasting, which we tackle in Sect. 5. This is typically done in conjunction with a proposal regarding vehicle relocation, which involves deciding how to proactively relocate shared vehicles in the operation area in order to meet the future demand. We can group forecasting proposals in three different classes, based on the approach they rely upon. There is a group of papers whose forecasting approach relies on techniques for time series forecasting. Wang et al. [21] leverage selective moving averages, Holt’s model, Winter’s model as well as Tabu Search heuristics for forecasting the demand in a car sharing service in Singapore. No prediction evaluation is carried out in the paper. Müller and Bogenberger [22] focus on the city of Berlin and investigate how to predict future bookings using seasonal ARIMA model and exponential smoothing with HoltWintersFilter. The second class of forecasting methods are those coming from the machine learning domain. Cheu et al. [23], for example, compare the forecasting performance of a neural network approach against that of Support Vector Regression, and find that the former provides better predictions. Neural networks have been later used also in [24,25,26]. The third class of forecasting approaches relies on custom solutions specific for the problem at hand. Boyaci et al. [27], for example, compile origindestination matrices by simply averaging the observations for different hours of the day, days of the week, and months of the year from real car sharing data. Weikl and Bogenberger [28] devise a prediction algorithm based on finding clusters of behaviours for daily timeslots. In all the above works, the evaluation of forecasting performance is carried out considering only a single city.
A preliminary analysis [29] of this dataset has been presented at KNOWMe’17, an ECMLPKDD workshop without copyrighted proceedings. In this extended version, we have added the sociodemographic study (Sect. 4) and the demand forecasting analysis (Sect. 5). In addition, we have added the analysis of the spatial autocorrelation of vehicle availability clusters (Sect. 6).
Knowledge mining for other transportation systems
From the methodology standpoint, this work is close to [30,31,32,33], in which bikesharing, rather than carsharing, systems have been analysed. Due to the different nature of the two systems, people use them differently, hence the results obtained for bike sharing systems cannot be applied directly to car sharing. However, similar methodologies can be exploited, e.g., to group stations based on how they are used by the customers.
This work is also orthogonal to the research efforts in the area of car pooling/ride sharing [34, 35]. The idea of car pooling/ride sharing is that people may share a vehicle (be it a private or public vehicle, e.g., a taxi cab) to perform their trips. Works in the area of car pooling typically focus on the amount of rides that can be shared, based on the historical or realtime trajectories of users, hence their focus is very different from that of this work.
Operation models for car sharing
As one of the pillars of a smart transportation system, car sharing has recently been the subject of extensive research from the operational standpoint. The research activity on this area has focused both on short and long term strategic decisions. The latter involves problems like planning the station/parking infrastructure [27, 36, 37] or planning the recharging infrastructure. The former is focused on decisions such as when and how to redistribute shared vehicles [38,39,40,41] or when and how to recharge them [42, 43].
To address the above problems, optimisation frameworks and operational decision tools for car sharing systems have been studied in the literature, but the proposed solutions have often been evaluated either on simulated scenarios [44, 45] or using as input the demand (in terms of origin/destination matrix) obtained from surveys [36, 46]. On the contrary, the availability of a statistical characterisation of the general properties of real carsharing systems, as well as a precise understanding of their emerging trends, is essential to both researchers and operators in order to design more effective decision support tools, and for the calibration and validation of simulations of car sharing systems. Thus, a datadriven analysis as that presented in this paper can be exploited to both drive and evaluate solutions for the supplyside of car sharing.
The dataset
The dataset comprises pickup and dropoff times of vehicles in 10 European cities for one of the major freefloating car sharing operator (Table 2). For nine of these cities, data has been collected between May 17, 2015 and June 30, 2015. For Munich, data covers the period from March 11, 2016 to May 12, 2016. The data has been collected every 1 minute using the available public API, which yields responses in the form of JSON files. Errors in the data collection process are due to technical problems on the booking website, in which cases corrupted entries have been discarded from the dataset. Each entry in the dataset describes the longitudelatitude position of available shared vehicles in the car sharing system, plus additional information. Each entry in the dataset has the following structure:
where vin is the unique identifier of a vehicle, \(\mathtt{date\_time}\) contains the date and the time at which the available vehicle has been observed, \(\langle\mathtt{lon}, \mathtt{lat}\rangle\) are the geographical coordinates, \(\langle\mathtt{interior}, \mathtt{exterior}\rangle\) refer to the cleanliness of the vehicle, engine specifies where the vehicle is electric or not. Due to faulty GPS systems, the reported coordinates may be inaccurate. For this reason the dataset has been preprocessed and coordinates that are manifestly invalid (e.g., cars available in different countries) have been discarded. Data preprocessing and analysis has been carried out in R.
Given the nature of our dataset, movements of cars have to be inferred from their unavailability during a certain time frame. Thus, when a car disappears from location A to later reappear at location B, we assume that the car has been picked up for a trip. We have no explicit way for distinguishing between regular customer trips and maintenance trips (e.g., cars that have been picked up by the car sharing operator for cleaning or repairing), as we simply observe a car disappearing from the map.
In order to understand the main characteristics, in terms of mobility, of the ten cities in which the car sharing system under study is operating, we have extracted information (summarised in Table 3) from the Eurostat’s City Urban Audit database [14]. Figure 1 summarises the main transportation mode in each city as resulting from the Principal Component Analysis applied to the reported modal share. We can identify three main classes of cities: one in which motorised modes dominate, one in which public transport (PT) and walking are more important, and one in which people move prevalently by bike.
In terms of pricing structure, the policy implemented by the car sharing operator at the time the dataset was collected was quite simple: the rental price is a linear function of the rental time (the specific price per minutes varies across the ten cities in the range \([0.24,0.46]\) €cent/min). No surge pricing nor proximitybased pricing were implemented in the ten cities. Also, there were no incentives for customers to change their destination and to bring back cars to areas where cars where more in demand. A perkilometre fee is applied only when the car is used for more than about 200 km.
Finally, an interesting feature of this dataset is that it contains entries for two cities (Copenhagen and Stockholm in our analysis) for which the car sharing operator has now shut down service. An index that is often used as a measure of car sharing success is the vehicle utilisation rate, defined as the number of daily trips per vehicle. A higher value means that vehicles are used intensively in the city, hence the car sharing service is more profitable. Please note that long trips in which customers rent the shared vehicle for a long time are not the target of car sharing services but belong to the class of longterm rental. For this reason, the vehicle utilisation rate, with its ability to capture the short and frequent trips, is a direct measure of car sharing effectiveness. Figure 2 shows the utilisation rate in the ten cities. It is clear how vehicles in some cities are much more utilised than in others, even 2–3 times more. It is also interesting to note that the vehicle utilisation rate is the lowest in the two cities (Copenhagen and Stockholm) where the service has been shut down months after we had collected this dataset. Remarkably, in Turin and Vienna there is quite a lot of variability in the utilization rate. This is due to vehicles being injected or removed from the system during the data collection period.
Demand characterisation through sociodemographic indicators and urban diversity metrics
In this section we focus on the demand, i.e., on the number of pickup requests observed in the different areas of a city, and we investigate how they are related to sociodemographic and urban fabric indicators. We discuss these indicators (which are the explanatory variables for our model) below, together with a brief description of the spatial unit of analysis considered in this section.
Sociodemographic data: Sociodemographic indicators characterise the population in the different areas of a city. For this analysis, we need a granularity finer than city level.^{Footnote 2} We were able to find open census data with the desired spatial granularity for the cities of Florence, Milan, Rome, and Turin. For their analysis, we focus on indicators related to the marital status, age group, educational attainment, employment status, and commuting habits. The census data are obtained from the Italian National Institute for Statistics (ISTAT) and correspond to the 2011 Italian Census.^{Footnote 3}
Urban fabric data: The wealth of activities (cultural, commercial, recreational, etc.) taking place in a specific area is characterised using information about the Points of Interest (PoIs) collected from the locationbased social network Foursquare.^{Footnote 4} When a user enters a new PoI, they are prompted to enter one of the firstlevel categories defined by the platform, which are Arts & Entertainment, College & University, Event, Food, Nightlife Spot, Outdoors & Recreation, Professional & Other Places, Residence, Shop & Service. We do not consider the category Event because events are generally limited in time, hence they typically do not overlap with our period of observation of the car sharing dynamics. Using this information, the urban fabric is characterised computing the number of PoIs (per category and overall) in each area. We also include a measure of the diversity of the urban fabric in an area by exploiting the concept of venues entropy introduced in [47]. The venue entropy of an area a is obtained as:
where \(\mathcal{C}\) is the set of firstlevel Foursquare categories, \(n_{c}(a)\) denotes the number of PoIs of category c in the area, and \(n(a)\) is the total number of PoIs in a. Intuitively, the entropy measures the uncertainty in predicting the category of a venue taken at random from the area, so the harder the prediction, the greater the diversity.
Spatial unit of analysis: We are constrained to use the smallest census area for which data are provided. In case of census areas that only partially cover the car sharing operation area, we consider the polygon resulting from their intersection and we rescale the sociodemographic indicators according to the percentage of overlapping. In order to have consistent estimates of the indicators inside each unit of analysis, we discard the census areas that overlap for less than 20% with the operation area.
The pickups events, the PoIs and the entropy in the spatial units of analysis for the four cities are illustrated in Fig. 3.
Explanatory analysis
Methods: We investigate the relation between the total number of pickups (y) and the indicators discussed above (which we denote with \(x_{k}\)) using a multivariate linear regression model of the form:
where \(\beta _{0} \ldots \beta _{j}\) are the unknown parameters and ϵ is the error term. As expected for the kind of indicators that we are considering, multicollinearity is present in the data. In order to mitigate its negative effects, we use Lasso shrinkage [48] to estimate the coefficient of our linear regression.^{Footnote 5} Another advantage of Lasso is that it also perform subset selection, whereby a reduced set of predictors that have the greatest effect on the response y is selected. In short, Lasso minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being smaller than a constant.
In standard linear regression, significance tests are used to test the statistical reliability of the rejection of the null hypothesis (i.e., that a coefficient \(\beta _{k}\) is zero). Recently, a significance test for the Lasso regression has been proposed [49], that factors in the selection bias related to the subset of selected predictors.^{Footnote 6} We will use this significance test to provide the pvalues for the coefficients of our regression.
As for the predictors that are skewed, we handle them applying a log transformation.
Results: The cities for which we were able to obtain census data with the required granularity are Milan, Florence, Rome, Turin. For Florence, the census areas with a significant overlap with the operation area were to few to get statistically meaningful results, so we also discarded this city. The Lasso regression results for the remaining Italian cities are shown in Table 4.
In all Italian cities (Table 4), an high educational attainment in a certain area is significantly associated with an increased demand for car sharing in that area. Vice versa, a low educational attainment is associated with lower demand in both Rome and Turin. This is largely in agreement with the findings from survey data discussed in Sect. 2.1: car sharing users tend to be better educated than the general population, and this signal is strong enough to be detected by the correlation between the demand and the demographic composition of neighborhoods.
Regularly commuting outside the reference municipality correlates negatively with car sharing demand (Milan, Rome). This is due to the fact that car sharing vehicles cannot be parked outside the operation area, hence they are not suitable for this type of commuting. They would be suitable, though, if paired with local public transport, using car sharing as a first/lastmile solution. This does not seem the case for Milan and Rome (the presence of transport facilities is not affecting the demand). It might be the case for Turin, as commuting outside the municipality is not considered a good predictor of the demand while a certain effect of the presence of transport facilities is detected. However, this effect is not statistically significant, hence this conclusion cannot be drawn from the data at hand. Our analysis seem to confirm the complex relationship between free floating car sharing and public transport discussed in Sect. 2.1: the synergy or friction between the two could be heavily dependent on local characteristics. An ad hoc analysis of this relationship would be an interesting followup work of the current investigation.
The marital status and age never correlate with the car sharing demand in the cities under study. The latter result is in contrast with the findings based on survey data, where age always played a significant role in the profiling of car sharing users. One explanation could be that the agerelated signal is weaker than the educationrelated one. Then, due to collinearity effects, the explanatory power of age is not considered sufficient by the Lasso. Another explanation is that age alone has never been explanatory, and its presence has always been due to its correlation with higher education attainments (in most OECD countries, young people are more educated than the elderly^{Footnote 7}).
In terms of urban fabric indicators, the presence of nightlife activities is associated with increased demand in all three Italian cities, while the presence of outdoor and recreational activities, as well as professional PoIs and residences, have a statistically significant effect in Milan only. Thus, leisure seems to be the main motivation behind car sharing trips in the three cities. Workrelated trips are significant only in Milan. When comparing these results with the surveybased findings summarised in Sect. 2.1, no clear trend emerges. While leisure and work trips are a common finding, the signal associated with shopping activities goes completely undetected in these three cities.
Demand forecasting
In this section, we focus on the elements that influence the shortterm behaviour of a car sharing system and we exploit them to forecast the demand. As we are interested in a finer spatial granularity (e.g. block level), we depart from the census areas used in the previous section. We thus need to identify a meaningful spatial unit to define car availability in a given area. In fact, differently from stationbased car sharing, in free floating car sharing there is no natural “aggregation” point for vehicles, which can be freely picked up and dropped off anywhere within the operation area. We can still perform a spatial analysis of car sharing usage by dividing the operation area into smaller cells and studying what is the behaviour, over time, in each of these cells. In this work we consider cells with side length 500 m, which is the maximal walking distance typically accepted by car sharing users [28, 50].
Demand predictability is one of the crucial aspects for every transportation system. In car sharing, in particular, it is of utmost importance for vehicle redistribution, whose goal is in fact to proactively move vehicles in order to address the future demand. In [29], comparing the time series of empty cells over time against that of available vehicles, we have shown that there typically a lot of empty cells but at the same time there are also a lot of available vehicles. This situation hints at a strong concentration of vehicles in certain areas, vehicles that could be proactively moved to where the customers most need them. Vehicle redistribution is typically performed periodically (e.g., every hour) and can be represented as a continuous cycling between three phases: (i) the forecast phase, when the expected pickups and dropoffs during the next relocation window are predicted; (ii) the selection phase, when the areas with vehicle surplus and vehicle deficit are identified and matched; and (iii) the dispatching phase, when the relocation workforce is assigned the previously defined relocation tasks [28].
Our goal in this section is not to develop a new custombuilt method for demand prediction in car sharing systems, but rather to compare stateoftheart solutions that belong to different forecasting approaches (see Sect. 2.2 for the discussion on existing methods) in order to understand their performance in the ten cities under study. Indeed, while prior work on demand prediction has focused on individual cities, it is important to assess the robustness of the most representative methods to cope with the heterogeneity of travel behaviours and urban fabric. The target audience of this analysis are researchers working on designing optimised transport models for car sharing who might benefit from knowing what is the best, offtheshelf, approach to prediction, so that they can focus their efforts on optimising the selection and dispatching phase discussed above. Similarly, thirdparties developers will benefit from this type of analysis. For example, one could think to set up a service (similar in vein to OpenStreetCab [51], whose goal is to provide the best option pricewise between Uber and NYC taxies for a given trip) whereby the most reliable car sharing service is recommended (e.g., one that guarantees that a car will be available in the evening when one drives back home). Thirdparties apps will most likely have access only to the public data made available by the car sharing operators (similar to the data we are dealing with).
Problem definition: The goal of demand prediction is to establish the vehicle deficit/surplus at the cells. It can be described using the general formula we presented in [41], which we discuss hereafter in a simplified version. If we denote with T the interval at which relocation is performed, every T minutes the car sharing operator will compute, for each cell i, the expected balance \(\hat{b}_{i}\) of vehicles at cell i for the next T minutes, which can be described as follows:
where \(v_{i}\) is the number of cars currently parked at station i, while \(\hat{\mathit{drop}}_{i}\) and \(\hat{\mathit{pick}}_{i}\) are, respectively, the forecast number of dropoffs and pickups in the next time interval. Please note that \(v_{i}\) is a known quantity as it photographs the current situation at cell i. Instead, \(\hat{\mathit{drop}}_{i}\) and \(\hat{\mathit{pick}}_{i}\) have to be estimated from what has happened in the past.^{Footnote 8} In the following, we show how statistical learning can help fill this gap and thus close the relocation cycle.
Let us focus on a tagged cell i belonging to the set of all cells \(\mathcal{C}\). We denote the set of days in our observation period with \(\mathcal{D}\). Then, we divide each day \(d \in \mathcal{D}\) in bins of length T (i.e. we discretize time). The prediction problem at hand is a typical one: we have historical data (a set of N observations) about pickup and dropoffs at cell i in each bin t for each day in \(\mathcal{D}\). We have to predict what will happen in each bin of the next days. In the following, we use the general term event to denote either pickup or dropoff events.
Features: For each cell i, we extract the following features for prediction:

number of events \(e_{(i,d,t)}\) observed in cell i at time t of day d

the time of the day (corresponding to bin t)

the day of the week (Sunday, Monday, etc.)

whether the day is a weekday or not

average number of events \(\hat{e}_{(i,d,t)}\) observed at bin t of day d in the neighbouring cells (we consider 2hop neighbours only).
Methods: We use the first 80% of the days in the dataset for training, and we predict the remaining 20%.^{Footnote 9} We set the time window T to 1 hour, implying that we want to forecast pickups and dropoffs happening in a onehour time frame. We only consider cells that have more than 30 events during the observation period. Then, we run the prediction algorithms and we measure the prediction error in terms of Root Mean Squared Error (RMSE).
We now define a set of relevant prediction techniques to be evaluated on the datasets at hand. It is important to point out that car sharing operators do not the disclose any detail on their approach to demand prediction. Thus, comparing against stateoftheart industrial benchmarks is not an option. The first two solutions that we consider are simple baselines based on historical averages/medians. With regards to our discussion in Sect. 2.2, the third one is representative of the class of time series prediction. Then, we pick two approaches for the machine learning category: neural networks (which have been already used in the literature for car sharing [24,25,26]), and Random Forest (which has been shown to be extremely effective when applied to bike sharing booking predictions [31, 32]). Finally, we test a technique in the custom forecasting category, specifically the one proposed in [28]. In the following we provide a description of each technique.
Prediction based on Historical Average (HA): this prediction function returns the average number of events observed in the same time window across different days. In other words, the predicted number of events \(\hat{y_{t}}\) at a certain time t in the future is obtained as \(\hat{y_{t}} = \frac{1}{\mathcal{D}}\sum_{d\in \mathcal{D}}e_{(i,d,t)}\). As car sharing typically exhibits marked differences between weekdays and weekends [20], we also test a version of the algorithm (denoted as HA+) that distinguishes between working days and weekends. A similar function has also been used as benchmark in the related literature on bikesharing forecasting [32, 33].
Prediction based on Historical Median (HM): the prediction function returns the median number of events observed in the same time window across different days, i.e., \(\hat{y_{t}} = \mathrm{median}_{d\in \mathcal{T}}(e_{(i,d,t)})\). This function is expected to perform well in cases where the distribution of pickups/dropoffs is highly skewed. As for the previous algorithm, we also test a version (denoted as HM+) that distinguishes between working days and weekends.
ARIMA: the Autoregressive Integrated Moving Average technique is a popular time series forecasting method. It is a generalisation of the ARMA model used in [22, 33]. Typically, ARIMA models are denoted with \(\operatorname{ARIMA}(p,d,q)\), where p is the order (number of time lags) of the AR component, d is the degree of differencing, and q is the order of the MA component. Here we use the seasonal version of the above ARIMA model, estimating the parameters for both the nonseasonal and the seasonal component (this allows us to detect cyclic behaviour, if it exists). We remind that in a seasonal ARIMA model, seasonal AR and MA terms predict the target variable using data values and errors at times with lags that are multiples of S (the span of the seasonality). For each cell the best configuration of the ARIMA parameters is selected according to their Corrected Akaike Information Criterion (AICc) value, using the auto.arima function of R’s forecast package. The search range for the parameters is the default one in the auto.arima function. Being this a time series method, only the temporal information of each observation and the actual observed values are fed to the model.
Random Forest (RF): treebased learning method that aggregates the prediction results of several decision trees obtained by randomly selecting, each time, only a subset m of the original p features (those described in the features section above). In order to select the most appropriate m, we used 5fold cross validation and we vary^{Footnote 10} m in \(\{2,4,5\}\). We use the implementation in the R package randomForest, together with the caret package for training and prediction.
Neural Network (NN): relying on the same settings as in [23], we use a single layer perceptron with as many neurons in the input layer as the features described above, one hidden layer (searching for the best number of neurons between 1 and 30), single output neuron, backpropagation, hyperbolic tangent activation function, linear output function. Categorical features have been represented using dummy variables. Then, input and output data were scaled to the range \([1,1]\), which is the sensitive range of the hyperbolic tangent activation function. We rely on the implementation in R package RSNNS, together with the caret package for training and prediction. Parameters selection is again performed using 5fold cross validation.
Algorithm in Weikl and Bogenberger [28] (WEIKL): one of the very few custom proposals in the literature on car sharing, the rationale of this algorithm is to represent each timeslot of each day through a vector, whose components are the number of events at each cell during the timeslot. Let us focus on a tagged timeslot t. These vectors describing the spatial demand for timeslot t across each day make up a matrix of size \(\mathcal{C} \times  \mathcal{D_{T}}\) (where \(\mathcal{C}\) denotes the set of cells and \(\mathcal{D_{T}}\) denotes the set of days in the training set). The \(\mathcal{C}\)dimensional representation of each day is then simplified using Principal Component Analysis, and only the first two principal components are retained. This twodimensional description of the days is then clusterised using kmeans, in order to group together days featuring the same demand behaviour. In the original paper, how the optimal number of groups is obtained is not specified, so we decided to rely on the gap statistic [52], a stateoftheart solution that is able to handle also the singlegroup case (i.e., to detect when the optimal choice is to not split in groups). Once this has been done for all timeslots, a socalled fromto matrix is built, computing the probability that days in a certain group \(g_{i}\) in timeslot t would be in group \(g_{j}\) in timeslot \(t+1\). Using this fromto matrix, it is possible to compute the demand variation from a timeslot to another for each group. This concludes the training phase of the algorithm. In the prediction phase, the demand in timeslot \(t1\) is mapped into one of the groups computed in the training phase (by closest centroid matching). Then, the number of forecasted events for timeslot t is obtained from the computed expected demand variation for the group. Please note that in [28], each day was divided in timeslots of nonuniform size. For fairness with the other prediction algorithms, we use timeslots of fixed size T. We have implemented this method in R.
Results: The results are shown in Figs. 4 and 5, for pickups and dropoffs respectively. For most cities and for all algorithms, the error is small, with forecasts off, on average, by less than one dropoff/pickup for the vast majority of cells. However, there are a few cells for which the prediction error is high. After an indepth analysis of the nature of these cells, we discovered that they are typically in very busy areas (e.g. near the airport), where both the high volume of traffic and the bustier nature of arrivals and departures may explain this variability. Also, the RMSE for pickups tends to be slightly higher than for dropoffs. In terms of which prediction algorithm works best, Fig. 6 shows that Random Forest provides the most accurate predictions for the vast majority of cells. The WEIKL algorithm is the second best, but its performance is very close to that of the NN approach and, surprisingly, to the simple Historical Average. HA+ and HM+, the versions of HA and HM algorithms that take into account the difference between weekdays and weekends, do not outperform in general their simpler counterparts. ARIMA, used also in [22] for forecasting car sharing demand, provides consistently the worst predictions.
In Fig. 7, in order to showcase the main strengths and weaknesses of the prediction techniques used, we focus on a tagged cell (specifically, on one for which the error is generally large) and we plot the time series of the predicted dropoffs (black curve) against the observed dropoffs (blue and red, in order to distinguish between weekdays and weekends). For the sake of readability, we consider one strategy per class of prediction approach: HA for the simple baselines, ARIMA for the time series forecasting class, RF for the machine learning approaches, and WEIKL for the custom solutions. The ARIMA model tends to replicate the same daily patterns across all days in the test set, since the ARIMA model is not able to capture multiple seasonalities, which are instead present in the data. By using predictive models that explicitly handle these multiple seasons (such as [53]), the quality of prediction could be significantly improved. A similar problem seems to hold for HA: it tends to replicate a “model day”, which is always the same. Instead, the predictions provided by the Random Forest algorithm are the most flexible ones, as they seem to adapt individually to each day. However, despite this flexibility, there seems to be an inherent variability in certain cells in the datasets (Figs. 4–5) that makes prediction difficult. The tagged cell considered here is also useful to illustrate the weakness of the WEIKL solution. Since it groups together many cells to extract a typical behaviour of the system in a given timeslot, the cells with a small number of events (which are many) tend to dominate over the more active ones (like the tagged cell considered here). Thus, in these cases, the predictions are significantly off with respect to the actual behaviour of the cell.
Spatiotemporal usage patterns
It is expected that cells in a car sharing system are used differently by the users, but how many different usages can be identified? In order to answer this question, in the following we carry out a classification of cells based on their usage pattern. To this aim, we focus on the time series of vehicle availability in each cell and we measure how close this time series is with what we observe in other cells. We measure the time series distance using the Dynamic Time Warping (DTW) technique [54] (with SakoeChiba band), then we cluster cells based on their DTWdistance using Partition Around Medoids (PAM) clustering. For each city, the optimal number of clusters is obtained using the silhouette method. In order to be able to compare our time series, we discretise time into bins with a duration of 10 minutes. For each cell, we extract one availability value per bin by averaging the availability in the bin in different days. In addition, in order to detect variation above and below the average behaviour, we normalise the measured availability using the average availability at the cell.
The results are shown in Fig. 8. The optimal number of clusters is 2 in Amsterdam, Florence, and Copenhagen, 3 in Berlin, Milan, Rome, Stockholm, Turin, and 4 in Munich and Vienna. However, the fourth cluster, when present, is a very special cluster, composed of just a single cell. This single cell is a very special one in the city ecosystem, and in both cities where the fourth cluster is present, this cluster comprises the airport zone. If we plot the availability time series within each cluster (Fig. 8, obtained by computing the average availability in the cells belonging to the cluster), it is striking to see that the clusters highlight very characteristic cell usage. Some cells have above average availability at night and below average availability during the day. Other cells have exactly the opposite behaviour. Finally, there is a group of cells with an intermediate behaviour, where apparently no significant difference in usage is detected over the whole day. It is easy to map this behaviour into the “nature” of the area covered by the cell: people leave residential areas in the morning and come back in the evening, while the opposite is true for commercial/business areas. Similar classes were identified in [20] for stationbased car sharing, and in [31] for bike sharing. Figure 8 also highlights the outlier behaviour of the airport zone (which constitutes the fourth cluster, when available). Airports in Munich and Vienna see a huge variation in availability; however, the behaviour of their time series is simply a scaled version of the commercial/business pattern discussed before. Due to the magnitude of the airport clusters’ time series, the behaviour of the other clusters of Munich and Vienna is barely visible in the plot. If we zoomed in, we would see the typical patterns that can be seen more clearly in cities with no airport within the operation area.
Based on the above discussion, we can associate each cluster with the trend in its corresponding availability time series. Thus, we identify four main behaviours: cells with midofday availability peak, cells with night peak, cells with no significant peak, and cells whose availability variations are much higher than in other cells. We use the labels day, night, neutral, and highintensity to refer to these four classes. In the following, we investigate to which extent the behaviour of cells is spatially autocorrelated. To this aim, since cell labels are categorical, we use the Join Count statistics [55]. With this approach, for each cell n, we count how many of its neighbouring cells belong to n’s class and we compare this result with what would be obtained if classes were distributed uniformly at random across cells. Since the highintensity class comprises at most one cell per city, we discard it from the analysis. The results for all cities are shown in Table 5. Cells exhibiting an availability peak at night are spatially autocorrelated in all ten cities. Cells with a midofday peak are spatially correlated in all cities except for Florence and Copenhagen. Out of the seven cities featuring neutral cells, the spatial autocorrelation is significant for only three of them. We can conclude that, in general, the availability of vehicles in cells tends to be spatially autocorrelated, hence neighboring cells tend to have shortage/abundance of vehicles at the same time. This further motivates the use of vehicle availability information in neighbouring cells for demand forecasting (RF and NN in Sect. 5 indeed rely on this information and their performance is quite good, with RF being the most performing prediction algorithm overall).
Locating cleaning and maintenance areas
A critical operational aspect for car sharing is how to perform cleaning and maintenance. When not done properly, it may even be a critical factor of the service shutdown, as in the case of Parisian car sharing Autolib.^{Footnote 11} In order to perform cleaning and maintenance, the car sharing workforce is typically remotely dispatched to collect vehicles that are in need of either. However, moving workers around is expensive, and more efficient solutions could be found based on the vehicle usage in the city. As a case study, in the following we discuss how to identify potential service areas within the operation area. A potential service area is a location vehicle pass by with very high probability. A workshop could be deployed in this area, and this would make cleaning and maintenance operations much more efficient.
We can use our dataset to understand if these potential service areas exist or not in the cities covered by the car sharing service under study. To this aim, we define a reference window W, corresponding to the accepted tolerance for taking out a vehicle for maintenance. Based on data from active car sharing operators, we assume that reasonable values for W are between 15 and 30 days. Then, for each cell, we count the number of distinct vehicles seen by the cells during W. Figures 9 and 10 show the results for the top three cells in each cities, i.e., the three cells that see the highest number of distinct vehicles during two different time windows (\(W=30\) and \(W=15\) days, respectively). Assuming that a (somewhat generous) threshold of 50% vehicles would be acceptable for the car sharing operator to justify the opening of a workshop in the area, all cities with the exception of Florence would accomodate three workshops satisfying this requirement when \(W=30\). The scenario \(W=15\) is by far more challenging: six cities would be able to open at least one workshop, but only one city could open two and three. The top ranking cell for cities whose operation area covers the airport is always the cell that includes the airport, which thus becomes a strategic asset in car sharing operations, in addition to being a huge generator of car sharing traffic.
Conclusions
In this work, we have collected webbased data about free floating car sharing in 10 European cities, cities that are heterogeneous both in terms of car sharing success and mode split. We have studied how the car sharing demand relates to sociodemographic and urban indicators, showing that the car sharing demand is positively correlated with high educational attainment and nightlife activities, while being negative correlated with the percentage of people commuting outside the municipality. These findings both confirm and extend the results in the related literature obtained from survey data. Then, focusing on the predictability of future car sharing requests, we have shown that they can be forecasted quite accurately using stateoftheart prediction algorithms, and we have highlighted the very good performance of Random Forest as predictor. Finally, we have proposed a strategy for selecting the area in which maintenance facilities should be deployed, and we have shown how the airport zone can become a strategic asset for car sharing operators, due to the fact that the high volume of traffic generated by the area makes it extremely convenient to deploy cleaning and maintenance facilities there.
Notes
 1.
Only for these cities we were able to find finegrained geospatial census data significantly overlapping with the car sharing operation area.
 2.
Please note that EU countries are legally bound to provide census data to the Eurostat database at most at the level of NUTS 2 (regions). The actual database (https://ec.europa.eu/eurostat/web/populationandhousingcensus/censusdata/database) contains data up to NUTS 3 level (provinces) but this is not enough for our purposes. For this reason, we resorted to individually checking the countries’ official institutes for statistics.
 3.
 4.
Through the Foursquare Places API it is possible to browse the venues in a certain geographic area. Since the standard API returns at most 50 venues per input area, each city is split into several browsing areas, whose size is properly dimensioned to ensure that all the available venues are acquired.
 5.
We use Lasso regression as implemented in the R package glmnet [56], using 10fold cross validation for parameter estimation.
 6.
As an example, running an Ordinary Least Square linear regression on the selected subset of predictor and calculating the pvalues associated to the coefficients would yield a very optimistic estimate of the significance, due to the fact that the subset of predictors is not selected independently of the data.
 7.
 8.
Note also that, for the sake of clarify, in Equation 4 we are intentionally neglecting the contribution of relocated vehicles that have yet to arrive at the cell from the previous relocation interval. This does not affect the forecast results discussed in this section because this number would be known in advance anyway and, thus, would not be part of the prediction process.
 9.
Please note that standard kfold cross validation cannot be performed with time series because time series data are not independent across time. The approach used in this paper is the same used in [32].
 10.
Note that the set of initial features (\(p=3\)) is expanded after applying OneHot encoding. For example, the categorical day of the week is split into 6 binary features.
 11.
Abbreviations
 PT:

Public Transport
 RMSE:

Root Mean Squared Error
 PAM:

Partition Around Medoids
 AICc:

Corrected Akaike Information Criterion
References
 1.
Mitchell WJ, BorroniBird CE, Burns LD (2010) Reinventing the automobile. Personal urban mobility for the 21st century, vol g. MIT Press, Cambridge.
 2.
“Cars are parked 95% of the time.” Let’s check! http://www.reinventingparking.org/2013/02/carsareparked95oftimeletscheck.html. Accessed 5 April 2018
 3.
Fact #613: March 8, 2010 Vehicle Occupancy Rates (2010). https://energy.gov/eere/vehicles/fact613march82010vehicleoccupancyrates. Accessed 5 April 2018
 4.
European Environment Agency: occupancy rates of passenger vehicles. https://www.eea.europa.eu/downloads/90455cbdfeff89c2c6149387ee11e4ea/1441389594/occupancyratesofpassengervehicles1.pdf. Accessed 5 April 2018
 5.
Shaheen S, Cohen A (2015) Mobility and the sharing economy: impacts synopsis—Spring 2015. Technical report, Transportation Sustainability Research Center, University of California, Berkeley
 6.
Kortum K (2014) Driving smart: carsharing mode splits and trip frequencies. In: Transportation research board 93rd annual meeting
 7.
Schwieger B, VictoreroSolares P, Brook D (2015) Global carsharing operators. Report 2015. Technical report, Team Red
 8.
Stopher PR, Greaves SP (2007) Household travel surveys: where are we going? Transp Res, Part A, Policy Pract 41(5):367–381
 9.
Meier RL (1962) A communications theory of urban growth. Published for the Joint Center for Urban Studies of the Massachusetts Institute of Technology and Harvard University by MIT Press, Cambridge
 10.
Kopp J, Gerike R, Axhausen KW (2015) Do sharing people behave differently? An empirical evaluation of the distinctive mobility patterns of freefloating carsharing members. Transportation 42(3):449–469
 11.
Wittwer R, Hubrich S (2018) Freefloating carsharing experiences in German metropolitan areas. Transp Res Proc 33:323–330
 12.
Giesel F, Nobis C (2016) The impact of carsharing on car ownership in German cities. Transp Res Proc 19:215–224
 13.
Becker H, Ciari F, Axhausen KW (2017) Comparing carsharing schemes in Switzerland: user groups and usage patterns. Transp Res, Part A, Policy Pract 97:17–29
 14.
Urban Audit Database. http://ec.europa.eu/eurostat/web/cities/data/database. Accessed 5 April 2018
 15.
Adam MB (2005) Carsharing: where and how it succeeds. Transportation Research Board of the National Academies
 16.
Project M (2010) The state of European carsharing. Technical report
 17.
Schmöller S, Weikl S, Müller J, Bogenberger K (2015) Empirical analysis of freefloating carsharing usage: the Munich and Berlin case. Transp Res, Part C, Emerg Technol 56:34–51
 18.
Willing C, Klemmer K, Brandt T, Neumann D (2017) Moving in time and space—location intelligence for carsharing decision support. Decis Support Syst 99:75–85
 19.
Kortum K, Schönduwe R, Stolte B, Bock B (2016) Freefloating carsharing: cityspecific growth rates and success factors. Transp Res Proc 19:328–340
 20.
Boldrini C, Bruno R, Conti M (2016) Characterising demand and usage patterns in a large stationbased car sharing system. In: The 2nd IEEE INFOCOM workshop on smart cities and urban computing. IEEE, pp 1–6
 21.
Wang H, Cheu R, Lee DH (2010) Dynamic relocating vehicle resources using a microscopic traffic simulation model for carsharing services. In: 2010 third international joint conference on computational science and optimization. pp 108–111. IEEE, http://ieeexplore.ieee.org/document/5532914/
 22.
Müller J, Bogenberger K (2015) Time series analysis of booking data of a freefloating carsharing system in Berlin. Transp Res Proc 10:345–354
 23.
Cheu RL, Xu J, Kek AGH, Lim WP, Chen WL (2006) Forecasting shareduse vehicle trips with neural networks and support vector machines. Transp Res Rec 1968(1):40–46
 24.
Xu JX, Lim JS (2007) A new evolutionary neural network for forecasting net flow of a car sharing system. In: 2007 IEEE congress on evolutionary computation. pp 1670–1676. IEEE, http://ieeexplore.ieee.org/document/4424674/
 25.
Schulte F, VoßS (2015) Decision support for environmentalfriendly vehicle relocations in freefloating car sharing systems: the case of Car2go. Procedia CIRP 30:275–280
 26.
Alfian G, Rhee J, Ijaz M, Syafrudin M, Fitriyani N (2017) Performance analysis of a forecasting relocation model for oneway carsharing. Appl Sci 7(6):598
 27.
Boyaci B, Zografos KG, Geroliminis N (2015) An optimization framework for the development of efficient oneway carsharing systems. Eur J Oper Res 240(3):718–733
 28.
Weikl S, Bogenberger K (2013) Relocation strategies and algorithms for freefloating car sharing systems. IEEE Intell Transp Syst Mag 5(4):100–111
 29.
Boldrini C, Bruno R, Laarabi HM (2017) Car sharing through the data analysis lens—KNOWMe: 1st International Workshop on Knowledge Discovery from Mobility and Transportation Systems. Technical report. arXiv:1708.00497
 30.
O’Brien O, Cheshire J, Batty M (2014) Mining bicycle sharing data for generating insights into sustainable transport systems. J Transp Geogr 34:262–273
 31.
Sarkar A, Lathia N, Mascolo C (2015) Comparing cities’ cycling patterns using online shared bicycle maps. Transportation 42:541–559
 32.
Yang Z, Hu J, Shu Y, Cheng P, Chen J, Moscibroda T (2016) Mobility modeling and prediction in bikesharing systems. In: Proceedings of the 14th annual international conference on mobile systems, applications, and services. ACM, New York, pp 165–178
 33.
Gast N, Massonnet G, Reijsbergen D, Tribastone M (2015) Probabilistic forecasts of bikesharing systems for journey planning. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, New York, pp 703–712
 34.
Trasarti R, Pinelli F, Nanni M, Giannotti F (2011) Mining mobility user profiles for car pooling. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 1190–1198
 35.
Santi P, Resta G, Szell M, Sobolevsky S, Strogatz SH, Ratti C (2014) Quantifying the benefits of vehicle pooling with shareability networks. Proc Natl Acad Sci USA 111(37):13290–13294
 36.
de Almeida Correia GH, Antunes AP (2012) Optimization approach to depot location and trip selection in oneway carsharing systems. Transp Res, Part E, Logist Transp Rev 48(1):233–247
 37.
Biondi E, Boldrini C, Bruno R (2016) Optimal deployment of stations for a car sharing system with stochastic demands: a queueing theoretical perspective. In: The 19th IEEE intelligent transportation systems conference, 2016. IEEE, pp 1–7
 38.
Pavone M, Smith S, Frazzoli E, Rus D (2012) Load balancing for mobilityondemand systems. Int J Robot Res 31(7):839–854
 39.
Kek AGH, Cheu RL, Meng Q, Fung CH (2009) A decision support system for vehicle relocation operations in carsharing systems. Transp Res, Part E, Logist Transp Rev 45(1):149–158
 40.
Febbraro AD, Sacco N, Saeednia M (2012) Oneway carsharing: solving the relocation problem. Transportation research record. Transp Res Rec 2319:113–120
 41.
Boldrini C, Bruno R (2017) Stackable vs autonomous cars for shared mobility systems: a preliminary performance evaluation. In: IEEE MoD@ITSC’17: modelling, analysis and control of intelligent mobilityondemand systems workshop
 42.
Rottondi C, Verticale G, Neglia G (2014) On the complexity of optimal electric vehicles recharge scheduling. In: Green communications (OnlineGreencomm), 2014 IEEE online conference on. IEEE, pp 1–7
 43.
Biondi E, Boldrini C, Bruno R (2016) Optimal charging of electric vehicle fleets for a car sharing system with power sharing. In: IEEE energycon. IEEE, pp 1–6
 44.
Nourinejad M, Zhu S, Bahrami S, Roorda MJ (2015) Vehicle relocation and staff rebalancing in oneway carsharing systems. Transp Res, Part E, Logist Transp Rev 81:98–113
 45.
Uesugi K, Mukai N, Watanabe T (2007) Optimization of vehicle assignment for car sharing system. In: Knowledgebased intelligent information and engineering systems. Springer, Berlin, pp 1105–1111
 46.
Jorge D, Correia GHA, Barnhart C (2014) Comparing optimal relocation operations with simulated relocation policies in oneway carsharing systems. IEEE Trans Intell Transp Syst 15(4):1667–1675
 47.
Karamshuk D, Noulas A, Scellato S, Nicosia V, Mascolo C (2013) Geospotting. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’13
 48.
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288
 49.
Tibshirani R, Johnstone I (2014) A significance test for the lasso. Ann Stat 42(2):413–468
 50.
Herrmann S, Schulte F, VoßS (2014) Increasing acceptance of freefloating car sharing systems using smart relocation strategies: a survey based study of car2go Hamburg. In: International conference on computational logistics. Springer, Berlin, pp 151–162
 51.
Noulas A, Salnikov V, Lambiotte R, Mascolo C (2015) Mining open datasets for transparency in taxi transport in metropolitan environments. EPJ Data Sci 4:23
 52.
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc, Ser B, Stat Methodol 63:411–423
 53.
De Livera AM, Hyndman RJ, Snyder RD (2011) Forecasting time series with complex seasonal patterns using exponential smoothing. J Am Stat Assoc 106(496):1513–1527
 54.
Esling P, Agon C (2012) Timeseries data mining. ACM Comput Surv 45(1):12
 55.
Cliff AD, Ord JK (1981) Spatial processes: models & applications. Taylor & Francis, London
 56.
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Availability of data and materials
The data that support the findings of this study cannot be publicly shared. Information is available from the corresponding author upon reasonable request. The ISTAT data are publicly available and can be found at https://www.istat.it/it/archivio/104317.
Authors’ information
CB and RB are permanent researchers at IITCNR. HL was a postdoctoral researcher at IITCNR at the time the study was carried out.
Funding
This work was funded by the ESPRIT, REPLICATE and SoBigData projects. The ESPRIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 653395. The REPLICATE project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 691735. The SoBigData project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 654024.
Author information
Affiliations
Contributions
Designed the study: CB RB. Analyzed the data: CB HL. Wrote the paper: CB RB. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Chiara Boldrini.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Car sharing
 Smart transportation
 Urban computing
 Data mining