Understanding vehicular routing behavior with location-based service data

Properly extracting patterns of individual mobility with high resolution data sources such as the one extracted from smartphone applications offers important opportunities. Potential opportunities not offered by call detailed records (CDRs), which offer resolutions triangulated from antennas, are route choices, travel modes detection and close encounters. Nowadays, there is not a standard and large scale data set collected over long periods that allows us to characterize these. In this work we thoroughly examine the use of data from smartphone applications, also referred to as location-based services (LBS) data, to extract and understand the vehicular route choice behavior. Taking the Dallas-Fort Worth metroplex as an example, we first extract the vehicular trips with simple rules and reconstruct the origin-destination matrix by coupling the extracted vehicular trips of the active LBS users and the United States census data. We then present a method to derive the commonly used routes by individuals from the LBS traces with varying sample rate intervals. We further inspect the relation between the number of routes and the trip characteristics, including the departure time, trip length and travel time. Specifically, we consider the travel time index and buffer index for the LBS users taking different number of routes. Empirical results demonstrate that during the peak hours, travelers tend to reduce the impact of traffic congestion by taking alternative routes. Overall, the proposed data analysis framework is cost-effective to treat sparse data generated from the use of smartphones to inform routing behavior. The potential in practice is to inform demand management strategies, by targeting individual users while generating large scale estimates of congestion mitigation.


Introduction
With the growing population in cities and the restructuring of urban economies and societies, a fundamental task of transportation planners and engineers is to effectively move people and goods [1].However, due to the daily increasing contradiction between the travel demand of citizens and the limit of road resources, the worsening traffic congestion not only causes tremendous economic loss and environmental problems but also has profound impacts on the public health [2,3,4].Today, a number of strategies have been proposed by policymakers and researchers over the world to relieve the traffic congestion, including advanced traffic signal control [5], strengthening the public transportation [6], reducing the travel demand of private vehicles [7], route recommendations [8], congestion pricing [9], or even looking into the future on autonomous vehicles [10].
Among the varying travel management strategies, understanding human mobility is always a fundamental task, supporting advanced decision systems.Thanks to the development of modern information and communication technologies (ICT) and the high penetration rate of mobile phone devices, researches can leverage on a large amount of digital traces with time stamps and geographical locations to understand and reproduce human mobility [11,12,13].Despite daily destinations of human mobility can be modeled at half square kilometer scale, the routing behavior of travelers in their road networks is still not well modeled from ICT data sources [14].A general solution to the lack of complete information is leveraging traffic assignment models, such as the user equilibrium assignment, dynamic traffic assignment, and the multi-agent approach, to assign each traveler to specific routes [15].These models assume that travelers choose their routes with the intention of minimizing travel costs, such as travel distance or time.To this end, individuals are assumed to have perfect or partial information about the alternatives available to them.However, the choice of route is simultaneously affected by multiple factors, and the route choice behavior of people follows the bounded rationality principle [16].This means that travelers can neither find the optimal routes because of the lack of accurate information about the traffic conditions, nor willing to spend much effort to obtain the optimized decision from complicated situations [17].With vehicular trajectories collected periodically from hundreds of travelers during several months, starting when the driver turns the engine on, until it is turned off, Lima et al. found that the individual routing behavior is independent of the urban layout.People always have a dominant route and the alternative routes are bounded within an elliptic shape of high eccentricity [18].In this work, we attempt similar analysis with the additional challenge imposed by data with less temporal accuracy and much lower frequency of collection, but much more pervasive than the vehicular trajectory data.
The prevailing big data resources utilized to model travel behavior include mobile phone data, check-in data at places of interest, and data collected by transportation agencies, like the floating car data.Among them, the mobile phone data, a.k.a.call detail records (CDRs), are passively collected and have the largest coverage.
CDRs have been used to understand the mobility behavior [19], and reproduce the aggregated travel demand [20] and the trips chains (a sequence of visited places with timestamps) of the population [21,13,22].However, methods solely relying on CDRs can not infer the routes taken by individuals due to coarse spatial resolution.
As one kind of location-based services data, the geo-tagged check-in data provide more accurate locations than CDRs, but the data can be collected only when the users actively "checked-in" at their places of interest, making it a too infrequent source.Such data is activity-dependent and can not continuously record the traces of users [23].Floating car data (FCD) are collected by transportation agencies for some specific purposes, recording the locations and speeds of floating cars.The GPS trajectories of taxis are the most commonly used FCD to analyze the traffic states.However, because of the cruising behavior of taxi drivers to search potential passengers, the taxis' trajectories are not perfect to study the route choice behavior of residents.In this work, we explore the location-based services (LBS) data, which specifically refer to the collection of the check-in or trajectory data generated by a set of smartphone applications.LBS use a smartphone's localization technology (i.e.GPS, Wi-Fi) to track the holder's location down to a street address, if the holder has opted-in to allow the service to do that.Compared to the check-in data from one single application, our LBS data collect the locations from multiple applications and have much higher sampling frequency for two reasons, (i) some applications continuously collect the locations for providing map-related services to users; (ii) the aggregation of records from multiple applications also increase the sampling frequency.In recent years, LBS data have been used to examining meaningful visited places and social mixing [24,25], travel behavior mining [26], and commuting pattern estimation [27].The emergence of data collaborative using LBS records in the light of COVID19 pandemic has accelerated their use and value [28,29].
This paper aims at analyzing the LBS for urban scale mobility demand, with focus in gaining insights on their use to extract routing behavior.Focusing on the Dallas-Fort Worth (DFW) metroplex, we first describe the process to impute vehicular trips from LBS data and then present a framework to deal with sparse data.Utilizing LBS data to analyze routing behavior faces important challenges: (i) LBS data are collected when the applications are activated, whenever the users are staying in one place or moving in unknown travel modes; (ii) LBS data are collected with varying sample rates, which hinder the detection of actual routes.
We present detailed steps to resolve these issues.Further, we analyze the change of routing behavior by connecting it with the number of trips, the travel distance, and the travel time during peak hours.The main contributions of this work are summarized as follows: (i) we present detailed steps to process the LBS data, extract the vehicular trios and detect routes without the use of the road network and map-matching; (ii) we analyze differences of travelers' routing behavior by different number of trips, travel distances, and the time periods in one day.(iii) we inspect the impact of traffic congestion on individuals' routing behavior using two metrics, the travel time index and buffer index.Empirical results confirm that individuals that explore more routes can reduce the impact of congestion and increase their reliability of travel times.The complete implementation of all of our data analysis framework can be found at https://github.com/humnetlab/RoutingBehavior.
The rest of the paper is organized as follows.In Sec. 2, we give an overview of the LBS data used in this work; Sec. 3 depicts the methodology to process the LBS data and find the travelers' routes; Sec. 4 analyzes the route choice behavior and its connection to multiple factors.Finally, we conclude the work in Sec. 5.

Data Description
LBS are services offered to users through applications installed on smart mobile devices.Geographical locations of the users are simultaneously and actively collected by the application developers or map service operators.The users are normally positioned by global positioning system (GPS) or Wi-Fi positioning system (WPS), which are fairly accurate in space and offer new opportunities to study human activity and its complex interaction with the built environment at fine scale [30,31,32,33,34].
The LBS data used in this work are provided by Cuebiq, a location intelligence and measurement platform [35].The datasets cover the DFW metroplex in Texas and were collected over a period of 6 months, from November 1st, 2016 to April 30th, 2017.The total number of users is approximate to 6.5 million and these users generated about 12.43 billion records in the given region and time period.Each LBS record consists of the pseudonymized user ID, timestamp and geographical coordinate.Fig. 1(a) illustrates the covered region and the visitation count in each grid cell in the first week of November 2016.The entire region is divided into 512×512 cells with approximate size 360×320 m 2 , down to the block level.The highlighting of freeways and downtown in the heatmap indicates that they are the busiest places in terms of visitation counts.
As LBS data are collected when the user is interacting with the application, the collection can be interrupted if the user stops using the application.Besides, the applications are used with variant frequency.Thus, the users in LBS data have different numbers of records and timespan, which is defined as the time difference between the first and last records of the user.In Fig. 1(b), we show the timespan versus the number of records for all users in a heatmap.The region with dark green indicates that a large number of users are associated with the corresponding timespan and the number of records.As we can observe, a considerable part of users were recorded during a long term but have small numbers of records as they are not using the applications frequently.Other users have a considerable number of records but short timespan.These might be temporary visitors to the DFW metroplex or short time adopters of the app.For exploring the routing behavior, we require longterm observation of moving traces.To that end, we select the users whose data are collected over 60 days or more and have more than 1, 000 records, as enclosed by a red rectangle in Fig. 1(b).As a result, 13% of the users and 86% of the records are kept for further analysis.
An important challenge remains, even with this sample, the records of LBS data are collected with variant, usually low frequency, because of the intermittent use of applications and different sample rates of applications.The LBS datasets are collected by a number of mobile applications (Apps) when the mobile phone user is interacting with these Apps or keeps these Apps running in the background.The sample interval, defined as the time difference between two consecutive records of the same user, is not fixed.Fig. 1(c) shows the distribution of sample intervals in the LBS data.The sample intervals of a large proportion of records are larger than 2 min, which much lower frequency than floating car data and hinder the routing behavior analysis, especially in dense road networks.Besides, the uneven sample rates shown in Fig. 1(c) are mainly caused by the aggregation of records from multiple applications.Next, we propose a method to deal with this limitation and extract some valuable information from this data source.

Methodology
For analyzing the route choice behavior, a primary task is to map the users to specific routes they were taking.However, tracking the routes from LBS data is a challenge in the following two aspects: (i) LBS collect the data of users when they stay or move with all kinds of travel modes, e.g., walking, biking, driving and public transportation.Vehicular trips must first be imputed from the raw data for further route choice behavior analysis; (ii) LBS data are collected from multiple applications with different sample rates and at low-resolution.This hinders the entire routes over the road networks.Once the vehicular trips are assigned, we then select high-resolution trips for route detection and find the routes of other low-resolution trips by aligning them with the high-resolution ones.

Vehicular Trips Detection
For the records of each user, we first partition her records into a sequence of trips by looking into the time difference between two consecutive records.After the user selection illustrated in Fig. 1(b), the remaining users are labeled as high-frequency ones, a.k.a, active users.In this context, we suppose that a user starts a new trip if there is no record for at least 30 minutes before the current one, that is, t current − t previous ≥ 30 min.Then we drop out the trips with less than 5 records.At this point, the records of each user have been partitioned into a sequence of trips in all kinds of travel modes.
A number of methods have been proposed to derive the travel mode from trajectory data, most of them process the high-resolution GPS traces or utilize so-phisticated learning methods that require gold labels of travel modes for model training [36,37].Here we use a simple rule to identify the trip as a vehicular trip if its average speed is between 20 km/h and 100 km/h, leaving room for further improvements.In addition, there are trips with outliers caused by the GPS drift, which are eliminated in our experiments.We label the points which have less than 50 neighbors in the set of points of all vehicular trips within 100 m radius as outliers.The entire vehicular trip is rejected once there are a considerable proportion of points in a vehicular trip (i.e., more than 20%) labeled as outliers.This method might also remove the trips taking the routes which are rarely used.But it would not impact our analysis as we place emphasis on the commonly used routes by each user.The pseudocode for deriving the vehicular trips from the raw LBS data is depicted in Algorithm 1.

Collective Travel Demand Estimation and Validation
Similar to travel behavior analysis using CDR data [38,20,39], we first detect the possible home locations of each active LBS user.To this end, we collect the stay locations for each user from the origins and destinations of all vehicular trips.Each stay location is associated with the departure or arrival time.As the users usually depart from home in the morning and arrive home in the end of the day, we select all departure locations between 5:00 a.m. and 10:00 a.m. and the arrival locations after 5:00 p.m. every day to compose the user's home candidate pool.If more than 30 locations are found, we then cluster the candidate locations using DBSCAN, setting the spatial threshold as 300 m, considering the users may park their vehicles around the significant places.We define the centroid of the largest cluster as a home place if the fraction of points in this cluster is larger than 40%.In Fig. 2(a), we show the detected home locations of the active LBS users.The accuracy of home detection is always challenging due to the lack of ground truth.Vanhoof et al. used the census data to validate the home location detection methods [40].
However, the validation can not be very reliable even at collective level because of the heterogeneous distribution of active LBS users in space.Note that we use the LBS users' home locations at ZIP code level to expand the users' travel demand, suggesting that we do not need to identify the home location within a few meters.We compare the active LBS users settling in each ZIP code versus its population from Algorithm 1 Vehicular Trips Deriving

Route Detection
The core challenge of deriving route choice behavior from LBS data is the varying sample interval of the records, as shown in Fig. 1(c).The varying sample interval in time leads to the heterogeneity of displacement between two consecutive records, ranging from several meters to kilometers.Such heterogeneity would cause the incorrect calculation of the similarity between two trips, and affect the clustering of trips.For instance, even when two low-resolution trips are taking the same route, the similarity between them would be low as the distance between the point pair would be large.One of the popular solutions is to map the points to the road network with map-matching and connect the distant consecutive points with the shortest path in the road network.However, it requires the road network and map-matching is computationally expensive for massive trajectory data [44].
To overcome this challenge, we design a simple yet efficient two-step procedure to find the routes: (i) selecting the high-resolution trips, in which the maximum distance gap between consecutive points is less than 1 km and detect the taken routes by trace clustering; (ii) matching the low-resolution trips to the high-resolution ones and finding the most likely taken routes.Fig. 4(a) presents the extracted vehicular trips from the raw LBS records for 200 sample users.The layout of vehicular trips displays a good match with the road networks in the DFW metroplex.We select one user to illustrate the two-step procedure, as shown in Fig. 4(b).For understanding the route choice behavior, we decide to focus on the frequently visited places between which there are repeated number of trips.To this end, we cluster the origins and destinations of all trips using DBSCAN and label the centroids of clusters as significant places.For each active LBS user, we then select two unidirectional OD pairs for further route detection, the OD pairs with the largest and the second largest numbers of trips, as illustrated in Fig. 4(c).Note that the two selected OD pairs may not be reversed.After this step, we keep 1,194,154 trips (5.3% of all trips after trip segmentation) of 58,333 users (0.9% of all users in the raw LBS data) for routing behavior analysis.
Among the trips between a selected origin and destination pair, we first select the high-resolution trips to label the routes.This is because the inference is more reliable when the distance gaps between consecutive points are small.The high-resolution trips are grouped to one or more clusters using a clustering method described in the following if there is more than one trip.The purpose of trip clustering is to group the trips which are taking the same route.There are two selection criteria for trip clustering, measurement of trip similarity and the number of clusters.Two of the most popular measurements are the longest common subsequence (LCSS) and dynamic time warping (DTW) [45,32].However, Atev et al. proposed a modified Hausdorff distance and confirmed that it could surpass both LCSS and DTW in trajectory clustering [46].In fact, we find that the modified Hausdorff improves its robustness to the noise by rejecting a number of worst matches of points in the two trajectories.In this work, we adopt the same modified Hausdorff to calculate the distance between two high-resolution trips and DBSCAN to cluster them into one or more groups.Ideally, each cluster represents one route.For the sample user in In the second step, we add the records of the low-resolution trips by aligning them with the high-resolution ones.Specifically, for each low-resolution trip, we first calculate the maximum distance between the point sets in it and the sets in each high-resolution trip.This distance indicates how far does this trip deviate from the high-resolution cluster and is used to decide if they belong to the same route.If the target low-resolution trip has a nearest high-resolution trip within a certain distance (e.g., 1 km), we identify its route the same as the high-resolution one.Otherwise, we remove this low-resolution trip as its route is uncertain.Fig. 4(e) presents the final route detection results for the sample user.The detailed pseudocode for route detection is depicted in Algorithm 2.

Distribution of number of Routes
We first inspect the statistical distributions of the number of trips N trip and the number of routes N route in the selected top two OD pairs for all active users.Next, we inspect the discrepancy of routing behavior during peak and off-peak hours.To this end, we split all trips in users' top OD pairs into four groups by their departure time, e.g., morning peak hours from 7:00 to 10:00 (AM), midday from 10:00 to 16:00 (MD), evening peak hours from 16:00 to 19:00 (PM) and the Algorithm 2 Route Detection rest of the day (RD).We then count N route of each user in these four time periods on weekdays and weekends, respectively.Fig. 5(c) presents the fraction of active users taking different N route during each time period on weekdays.The number of trips in each time period is presented in the inset.We observe the fractions of users taking 2 routes and above during AM and PM are apparently higher than the other two periods, suggesting that the users tend to take more routes during the peak hours to finish their trips more efficiently, e.g., in shorter travel time.As for the routing behavior on weekends shown in Fig. 5(d), the distribution of N route changes little between time periods due to the traffic on weekends is not as congested as weekdays.

Route Choice Behavior of Different Groups of Travelers
For comprehensive understanding of the discrepancy of route choice behavior among the travelers, we group them by their travel frequencies and travel distances.According to the distribution of N trip presented in Fig. 5(a), we split the travelers into four groups according to the following rules, N trip < 20, 20 ≤ N trip < 40, 40 ≤ N trip < 60, and N trip ≥ 60.Fig. 6(a) presents the distribution of N route per group.We notice that frequent travelers tend to explore more routes than nonfrequent travelers, most likely because the frequent travelers know the traffic congestion better and are more confident to find efficient alternatives.The phenomenon also can be confirmed in Fig. 6(b), which presents the distribution of N trip of users taking different number of routes in their routine OD pairs.The median value of N trip of travelers with more than three routes is around 60, while the median value of travelers sticking on one route is less than 30.
The distance between origin and destination could be one of the factors that affect the number of routes selected as the distance determines the number of candidate routes in a given road network.We then compare the routing behavior of travelers with different ranges of travel distance.The travelers are grouped into Q1 to Q4 by the 25th, 50th, 75th percentiles of their travel displacements.Fig. 6(c) depicts the distribution of N route of each group, and Fig. 6(d) depicts the distribution of travel displacements of travelers with different N route .As expected, we can observe that most of the users with short trips in Q1 stick on only one route.From Fig. 6(d), we can see the peak of the distribution is around 5 km for the users who only take one route; while the peak is around 10 km for the users who take more than 3 routes.
These observations indicate that more routes are likely been taken if the users make longer trips between two significant places.It can be explained from the perspective of network.In a dense road network, the larger the distance between two nodes is, the more alternative routes with similar cost the travelers can choose.

Route Choice Behavior in Traffic Congestion
Traffic congestion is a major consideration driving travelers to find alternatives, especially during peak hours.Here, we investigate the relation between travel time and the number of routes in the routine OD pairs of active travelers.For each user, we calculate the travel time index (TTI) of all trips in travelers' top OD pairs to assess the additional travel time caused by congestion.Given a number of trips in an OD pair, TTI is defined as the ratio of the average travel time to the free flow travel time, where T avg refers to the average travel time of all trips made by one user; N route during the AM (7:00-10:00) and PM (16:00-19:00) peak hours on weekdays, respectively.It is noticeable that travelers taking more routes tend to have lower TTI.The average TTI values are also illustrated in Table 1.To reduce the impact of the extreme small or large values of TTI, we also present the mean value of the TTI falls into the 95% confidence interval and the standard deviation (STD) in Table 1.We can conclude that travelers with flexible route choice behavior can lower their travel time by avoiding traffic congestion in the primary routes.The insets of Fig. 7(a) and (b) present the distribution of TTI for all travelers, showing the average TTI is nearly 2.0 during the peak hours.This reveals that, because of the congestion, the travelers in DFW metroplex spent nearly double free flow travel time to complete their journeys during rush hours.
Beyond the additional traffic time caused by congestion, the reliability of travel time is also significant to many travelers, especially when they need to arrive at the destination on time.Reliability has been considered as a key performance measure by transportation planners and decision-makers.We introduce the buffer index (BI) to assess the travel time reliability of all trips in the traveler's top OD pair [47].
The BI represents the extra buffer time that the traveler should add to the average travel time when planning trips to ensure on-time arrival.Here we define BI as the relative gap between the 85th percentile travel time and the average travel time of an OD pair, The BI is expressed as a percentage and its value increases as reliability gets worse.Fig. 7(c) and (d) present the BI of the travelers with different N route during the AM and PM peak hours on weekdays, respectively.The average BI values, the average BI in the 95% confidence interval, and the STD of BI are illustrated in Table 1.We notice that the average BI decreases along with the increase of N route , suggesting that the travelers are changing their routes considering the real-time traffic to increase the reliability of travel time.

Conclusion and outlook
Understanding the route choice behavior is an essential task for not only modeling human mobility in transportation networks but also route management to relieve traffic congestion.In this paper, we presented a data analysis framework for understanding route choice behavior with massive LBS data.Steps include, user selection, vehicular trip enrichment, trip clustering, route detection, and behavior analysis.
We analyzed the six-month LBS data in the Dallas-Fort Worth metroplex, and selected the trips between the most frequent origin-destination pair of each user for understanding routing behavior.We found that the distribution of the number of routes can be modeled by a log-normal distribution.We also inspected the relation between the number of routes and the travel displacement and found that travelers with longer travel distances tend to select more routes to shorten their travel time.
We also confirmed that travelers take more routes during peak hours than off-peak hours, and those individuals that explore more routes reduce their impact of congestion and increase their reliability of travel times.The proposed framework makes LBS data useful to evaluate the route choice behavior of different groups of travelers and their reaction to traffic congestion.As future applications, this could be implemented to evaluate traffic regulation strategies, such as the congestion charges.
Moreover, there are still some directions for further study.For instance, (i) for comprehensively understanding people's emphasis on different travel costs (i.e., travel time, routing distance, etc.), we need to further integrate these factors per route per user.To that end, we need to estimate the traffic states in the entire road network through map-matching; (ii) this work presented a case study in a region without congestion pricing.The relation between socio-economic characteristics and routing behavior merits more attention in cities with traffic interventions.

Fig. 3 (
Fig.3(c).The Zip codes in the urban area generally have smaller expansion factors than rural area.We then aggregate the vehicular trips during the morning peak hours (6:30 -9:00 a.m.) at Zip code level and scale the flow with the expansion factors.In Fig.3(d), we compare the values of vehicular travel demand for all OD pairs at ZIP code level between the expanded LBS flow and the NCTCOG survey in the morning peak hours, and find the linear fitting slop equals to 0.82 and r 2 = 0.59.Fig. 3(e) illustrates the spatial distribution of vehicular travel flow above 0.01% of the total demand during the morning peak hours achieved from the expanded LBS data and NCTCOG data, respectively.Even we show the estimated travel demand is visually comparable to the NCTCOG survey data, the Pearson correlation only reaches 0.79.That can be caused by several reasons in this work, such as (i) we simply selected active users with the timespan and number of records in the raw LBS data, aiming at removing the temporary visitors in the DFW metroplex.However, we can not accurately identify residents from all LBS users with such simple rules; (ii) The distribution of active LBS users is different to the residents in space, as shown in Fig. 2(b) and the spatial distribution of expansion factors in Fig. 3(c); (iii) As we used simple rules to identify the vehicular trips, some non-vehicular trips are kept in our OD matrix; (iv) we used simple expansion factors to expand the travel demand of active LBS users to the population.

Fig. 4 (
Fig. 4(b), there are three routes detected on the high-resolution trips, differentiated by color in Fig. 4(d).

Fig. 5 (
Fig.5(a) and (b) present the distributions of N trip and N route , respectively.Lognormal distributions resemble the data in both cases, in agreement with Lima et al.'s findings[18].The mean value of N trip reaches 29.08, while the mean value of the N route between these OD pairs is 1.56.From Fig.5(b), we observe that among all active users in our LBS data, 51.35% of them only take one route to complete the top OD pairs; 37.5% of them take two routes and only 11.15% of them take more than 2 routes.
T f ree refers to the free flow travel time from her origin to destination, approximated by the minimum travel time among all trips in the OD pair.The larger the TTI is, the more traffic congestion the user met during the routine journey.The introducing of TTI enables us to compare the travel delay of OD pairs even they have various travel distances.Fig. 7(a) and (b) illustrate the TTI of the travelers with different

Figures
Figures

Figure 1 :
Figure 1: LBS data and user selection.(a) Spatial distribution of users' traces in the LBS data, measured by the logarithm of the total visitation in each grid.(b) User timespan versus his/her number of records.Users outside the red rectangle are eliminated in further analysis.(c) Distribution of sample interval of LBS data.

Figure 2 :Figure 4 :
Figure 2: User home location estimation.(a) User home locations estimated with visitation time and frequency.(b) Comparison between the active LBS users settling in the ZIP codes and the corresponding population from the census data.

Figure 5 :
Figure 5: Route choice behavior analysis.(a) Distribution of number of trips, N trip , in the routine OD pairs for all active users in the LBS data.The data follows a log-normal distribution.(b) Distribution of number of routes, N route , in the routine OD pairs, also follows a log-normal.(c) Distribution of N route during different time periods, e.g., AM, MD, PM, and RD, on weekdays.The inset shows the number of trips during each time period on weekdays.(d) Distribution of N route during different time periods on weekends.The inset shows the number of trips during each time period on weekends.

Figure 6 :Figure 7 :
Figure 6: Connection between number of routes and the number of trips and travel displacement.(a) Fraction of N route for users grouped by number of trips.(b) Distribution of N trip of travelers with different N route , e.g., one, two, three, and more than three routes.(c) Fraction of N route for users grouped by range of travel distance.(d) Distribution of user travel displacement for travelers with different N route .
Find stay locations via DBSCAN clustering the origins and destinations 3: if # points in cluster ≥ 5, then 1: Step 1: Top OD pair selection 2: 1.1: 14: 2.3: Label each cluster as one route 15: return high-resolution trips with labeled routes 16: Step 3: Route labeling for low-resolution trips 17: for trip L ∈ low-resolution trips do ▷ Loops through each trip in the low-resolution trips 18: for trip H ∈ high-resolution trips do 19: Calculate the maximum distance from points in trip L to trip H 20: Find the nearest trip H,nearest and the distance D H,nearest 21: if D H,nearest ≤ 1 km then 22: Give the route label of trip H,nearest to trip L 23: else 24: Route taken by trip L is uncertain, remove it 25: return trips with labeled routes

Table 1 :
Mean values and STD of travel time index and buffer index of users taking different number of trips.The TTI * and BI * indicate the values mean values calculation are selected in the 95% confidence interval.