Open Access

Spatiotemporal correlations of handset-based service usages

  • Hang-Hyun Jo1Email author,
  • Márton Karsai1,
  • Juuso Karikoski2 and
  • Kimmo Kaski1
EPJ Data Science20121:10

DOI: 10.1140/epjds10

Received: 7 April 2012

Accepted: 9 October 2012

Published: 6 November 2012

Abstract

We study spatiotemporal correlations and temporal diversities of handset-based service usages by analyzing a dataset that includes detailed information about locations and service usages of 124 users over 16 months. By constructing the spatiotemporal trajectories of the users we detect several meaningful places or contexts for each one of them and show how the context affects the service usage patterns. We find that temporal patterns of service usages are bound to the typical weekly cycles of humans, yet they show maximal activities at different times. We first discuss their temporal correlations and then investigate the time-ordering behavior of communication services like calls being followed by the non-communication services like applications. We also find that the behavioral overlap network based on the clustering of temporal patterns is comparable to the communication network of users. Our approach provides a useful framework for handset-based data analysis and helps us to understand the complexities of information and communications technology enabled human behavior.

1 Introduction

Understanding macroscopic socio-economic phenomena of a large number of individuals has been extensively studied by means of social, physical, and computational sciences [13]. Recent access to large-scale digital datasets on human dynamics and social interaction has enabled us to quantitatively investigate the structure and dynamics of human communication networks. Indeed, researchers have studied various datasets, ranging from email and mobile phone communications to social network services, e.g. Twitter and Facebook [411]. Mobile phones or handsets are now actively utilized to accurately measure or sense human behavior because the handsets equipped with a variety of sensors, including GPS and WiFi, are carried around by the users everyday and all day through. Highly resolved location data collected from handsets have been recently used to uncover human mobility patterns [1220]. The reliability of data collected from handsets, i.e. ‘behavioral’ data, was tested in the serial studies conducted within the frame of MIT’s Reality Mining project [17, 18, 21]. It was shown that the behavioral data are at least comparable to self-report survey data in terms of friendship network and even capturing information that self-reports are missing [18].

The handset usage patterns are known to be diverse among users when measured by the number or duration of the phone sessions and by the amount of data received, to name a few [22, 23]. Within the individual handset usage patterns, temporal inhomogeneities due to circadian and weekly cycles were also reported [10], which are in close relation to the spatial inhomogeneities, such as nighttime at home and daytime in office. Therefore, for conducting a comprehensive study, it is important to identify the context characterizing the situation of handset user, and then to understand how the context affects service usage patterns [2327]. However, it is only very recently when the effect of context on the handset-based service usages was investigated. But so far the analysis has been conducted mostly at the aggregate level, while the temporal diversities of service usage among users have been ignored [27].

In this paper, we study spatiotemporal correlations of the service usage patterns of individual users by analyzing a handset-based dataset. This dataset was collected from 124 users’ handsets for over 16 months as a part of the OtaSizzle project at Aalto University, Finland [28]. A software installed on handsets collected information about the handset’s locations and usages of various services, including web domain visits, applications, emails, voice calls, and short message services, with the resolution of seconds in time and mobile network base stations spatially. After constructing spatiotemporal trajectories of the users we identify several contexts that are meaningful to them by using the context detection method [26]. Other methods include, for example, places of interest or meaningful locations [29, 30] and eigenmode analysis [3133]. Then, we find correlations between the spatiotemporal trajectories and the service usage patterns. We observe the similarity and diversity in temporal patterns of the service usages and discuss their temporal correlations, time-ordering behavior between services, and behavioral overlap network based on the clustering results. Our approach provides a useful framework for handset-based data analysis, and hence it would be important for better design of information and communications technology (ICT) enabled social environments and services.

This paper is organized as follows. In Section 2 we describe the data collection and preparation methods. In Section 3 several contexts for each user are identified by means of the context detection method applied to user’s spatiotemporal trajectory. In Section 4 we uncover the spatiotemporal correlations and the similarity and diversity in temporal patterns of the service usages. Finally, we summarize the results with concluding remarks in Section 5.

2 Handset-based dataset

2.1 Data collection method

The handset-based dataset in this study was collected by the MobiTrack software installed on Nokia Symbian smartphones of 183 participants or users from September 2009 to December 2010, i.e. for a period spanning about 16 months. All users were students and staff members of Aalto University, Finland and identified as early adopters of mobile phones and services [34]. The dataset was anonymized so that no personal information of the users could be obtained. We consider only 124 users with the overall duration of handset usage longer than 30 days, see Section 3 for details.

The dataset consists of two kinds of information: locations and service usages. The resolution of locations is limited to the physical area covered by each mobile network base station, i.e. cell, denoted by c. Whenever the handset is connected to a new cell or otherwise every half an hour, the identifier of the cell connected by the handset was recorded with a timestamp t with one second resolution. Each cell can be located in the geographical space with a unique pair of latitude and longitude. The geographic information for cells and the maps used in Figures 1 and 6 were collected as a part of the OpenNetMap project and from open databases [3537]. For all users we have 5,596,041 records at 99,206 different cells. Although only 29.0% of cells could be located in the geographical space, they correspond to 91.3% of records. Figure 1 shows all located cells over the world, in Finland, and in the Helsinki municipal area. In this way, the detailed spatiotemporal trajectory of each user could be constructed in terms of a sequence of cell records { ( c k , t k ) } , where k denotes the ordered index of record.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig1_HTML.jpg
Figure 1

Recording frequencies at mobile network base stations by all users. (A) Over the world, (B) in Finland, and (C) in the Helsinki municipal area. The higher frequency is denoted by the warmer color. In (C) the size of circle is logarithmic in frequency.

For service usage data we consider five services: web domain visit (web), application (app), email, voice call (call), and short message service (SMS). Each service usage or event was recorded with a timestamp with one second resolution together with service-specific relevant information. In the case of web domain visits, a URL (Uniform Resource Locator) was extracted and recorded whether it was visited via browser or widget. Only the applications visible in the foreground of the handset were recorded so that no process or application running in the background was considered. The records of communication services, such as email, call, and SMS, include the information on whether the user was an initiator or receiver of the communication event, and on the communication partner if available. For more information regarding the data collection method, see [34].

2.2 Data preparation method

The service usage dataset contains events mostly generated by users but it also contains automatic events by the operating system of the handsets. In order to observe the pure human behavior, we systematically filtered out these automatic events. However, some spurious regularities still remain in the web dataset. In the cases of http://google.com, http://facebook.com and so on, once a web is connected, the browser might visit the same web automatically for periodic updates and synchronization of accounts until the web is disconnected. To resolve this issue, we obtain the distribution of inter-event time τ, defined as the time interval between consecutive web domain visits by the same user. Several sharp peaks at specific inter-event times are found, where each peak is mostly related to the single webpage. We remove all the events leading to those inter-event times, except for the event trains consisting of only two events with τ = 10 seconds. It is because some trains with only two events separated by 10 seconds can also be generated by users. As new regularities become visible after filtering, we apply this method recursively until the peaks are suppressed considerably, leading to an approximately 25% of entire events removed. Figure 2 shows that this filtering method for web dataset does not change the overall characteristics of the inter-event time distribution.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig2_HTML.jpg
Figure 2

Original and filtered distributions of inter-event time for web domain visits by all users. The inter-event time is defined as the time interval between consecutive web domain visits by the same user. The peaks due to automatic events by the browser have been successfully suppressed after filtering.

We also ignore some user-generated application events associated with other service usages, corresponding to 17% of entire events. For example, the user opens the messaging application when sending or receiving SMSs. These application events might lead to artificial correlations between different service usages. In addition, corrupted events, less than 0.1% of the whole dataset, have been ignored or manually corrected. Finally, we have 792,971 web domain visits, 433,726 application events, 17,976 emails, 79,779 calls, and 79,283 SMSs in the service usage dataset.

3 Context detection from spatiotemporal pattern

In order to detect the contexts for each user, we construct the user’s spatiotemporal trajectory from a sequence of cell records { ( c k , t k ) } . It is necessary to infer the user’s location between consecutive timestamps of cell records. From a sequence of cell records, we derive the temporal boundaries { ( c k , t k ( s ) , t k ( e ) ) } for the user’s trajectory, implying that the user stays within the area covered by cell c k from the moment of t k ( s ) to t k ( e ) , see Figure 3. It is assumed that the user stays in the cell c k till t k ( e ) = 1 2 ( t k + t k + 1 ) and then in the cell c k + 1 from t k + 1 ( s ) = t k ( e ) when t k + 1 t k 2 t c . Here we set t c as half an hour, i.e. the time interval for regular cell recording. The time interval between consecutive timestamps longer than 2 t c implies that the handset may be turned off, used in offline or airplane mode, or not able to detect any cell nearby. If t k + 1 t k > 2 t c , the user is considered to stay in the cell c k till t k ( e ) = t k + t c and in the cell c k + 1 from t k + 1 ( s ) = t k + 1 t c . Hence, the location is unknown between t k ( e ) and t k + 1 ( s ) . Then, the total time spent, i.e. duration, in each cell c is obtained as follows:
d c = { k | c k = c } ( t k ( e ) t k ( s ) ) .
(1)
If the sum of durations in all the recorded cells, D c d c , is less than 30 days, that user is not considered for the further analysis, leading to 124 available users. The average and standard deviation of D for available users are 121 ± 63 days.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig3_HTML.jpg
Figure 3

Schematic diagram for deriving temporal boundaries from a sequence of cell records. Temporal boundaries { ( c k , t k ( s ) , t k ( e ) ) } (colored boxes) are derived from a sequence of cell records { ( c k , t k ) } (vertical black lines). It is assumed that the user stays in the cell c k from the moment of t k ( s ) to t k ( e ) . See the text for details.

In addition, we observe back and forth changes in a short time span between two cells covering the neighboring areas. It can occur even without any real movement of the handset if the handset is located at the boundary of two neighboring cells. To filter out this noisy behavior, the involved cells can be clustered by a sandwich clustering method [26]. Here we consider only one type of sandwich with four records involving two cells, i.e. c k = c k + 2 c k + 1 = c k + 3 with t l ( e ) t l ( s ) t c for l = k , , k + 3 . Whenever this type of sandwich is detected, every c k in the temporal boundaries is replaced by or merged into c k + 1 if d c k + 1 > d c k , and vice versa. Consequently, some geographically neighboring cells can be clustered into one representative cell, which from now on will be considered equally with normal cells. For example, the first row in Figure 4 shows the user 81’s temporal boundaries during typical Friday and Saturday. Note that clustering cells for one user is independent of other users’ records.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig4_HTML.jpg
Figure 4

Locations and service usage patterns of a sample user 81 during typical Friday and Saturday. The first and second rows represent cells and contexts assigned to cells. Home, Office, Other meaningful place, and Elsewhere are denoted by red, blue, green, and gray colors, respectively. Different depths of the same color indicate the different cells belonging to the same context. Service usage events are denoted by vertical lines in the rows of web, app, email, call, and SMS (from the third to the bottom).

We find spatiotemporal inhomogeneities of the trajectories of handsets on the individual basis as well as at the aggregate level. As an illustrative example, we obtain the rank curve d ( r ) , defined as the duration in the r th cell c in a descending order according to d c . The rank curve for all users is highly skewed, such that the first few cells, including one in Otaniemi campus of Aalto University, were visited for more than a few months while 88.9% of cells were visited for less than one hour, as shown in Figure 5. The same inhomogeneities are also observed for individual users. For example, the rank curves for users 5 and 81 are shown in Figure 5, who were selected to show the representative behavior.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig5_HTML.jpg
Figure 5

Rank curves of cells. A rank curve is defined as an ordered sequence of durations in cells. We plot rank curves of cells for all users and for sample users 5 and 81.

The heavily visited cells are supposed to cover meaningful places to the handset user, such as home and office. Since the service usage patterns might be affected by the different characteristics of meaningful places, it is important to identify the context characterizing the situation of user. Here the context is preferred to the meaningful place because the time and place of handset usage are not independent but correlated, e.g. nighttime at home and daytime in office [26]. Each cell will be detected as one of five contexts, such as Home, Office, Other meaningful place (Other), Elsewhere (Else), and Abroad. One context can be assigned to several cells. The identifier of a cell contains the mobile country code (MCC), by which Abroad context is assigned to the cells out of Finland. For the cells within Finland, we obtain more detailed durations for each cell c:
  1. 1.

    duration on weekdays ( d c , wd ),

     
  2. 2.

    duration on weekdays between 0 AM and 6 AM ( d c , 0 6 ), and

     
  3. 3.

    duration on weekdays between 10 AM and 4 PM ( d c , 10 16 ).

     
Now we describe criteria for assigning contexts except for Abroad. A cell is detected as Elsewhere (Else) if the duration in that cell is negligible to the total duration as
d c / D < t elsewhere = 0.02 .
(2)
For example, Else is assigned to the cells along the highways. The threshold value of t elsewhere has been determined in order to leave only 0.2% of cells, i.e. 3.73 cells per user, for other contexts. A cell is detected as Office if the user spends a considerable time in that cell during the working time on weekdays as
d c , wd / d c > t weekday = 0.8
(3)
and
d c , 10 16 / d c , wd > t worktime = 0.5 .
(4)
With above threshold values, at least one Office has been detected for more than half of the users. Note that most users were students so that they might not have any regular places to visit during the working time. Next, Home is assigned to a cell if the user spends a considerable time in that cell for nighttime and free time, i.e. the remaining time except for the working time, on weekdays as
d c , 0 6 / d c , wd > t nighttime = 0.1
(5)
and
d c , 10 16 / d c , wd < t freetime = 0.3 .
(6)
With above threshold values, at least one Home has been detected for all users except for two of them. Many users turn out to have more than one Home, such as user’s own home and his/her parent’s home. Finally, the remaining cells are detected as Other meaningful place (Other). Figure 6 shows the locations of detected contexts for sample users in the Helsinki municipal area. We put two sample users’ contexts together to avoid privacy issues.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig6_HTML.jpg
Figure 6

Contexts detected for sample users 5 and 81 in a map of Helsinki municipal area. Each cell is represented by the circle with its radius according to the duration in that cell. The cells are identified as either Home (red), Office (blue), Other meaningful place (green), or Elsewhere (gray).

Our context detection method is validated by weekly patterns of duration for different contexts obtained for sample users and at the aggregate level, as depicted in Figure 7. For example, the user 5 without Other detected shows a very regular pattern, especially on weekdays, i.e. at Home in nighttime, in Office during the working time, and at Else when moving between Home and Office. Weekly patterns of user 81 are comparable to the temporal boundaries in terms of detected contexts, as depicted in the second row in Figure 4. Weekly patterns of duration aggregated over all users show the overall behavior. Durations at Home, Office, Other, and Else account for 66.8%, 7.0%, 8.5%, and 14.0% of the total duration of all users, respectively.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig7_HTML.jpg
Figure 7

Weekly patterns of duration for the different contexts. Weekly patterns of duration in hours for the different contexts are consistent with the context detection results for users 5 and 81 and for all users (from top to bottom). The typical weekly cycles of humans are observed.

4 Spatiotemporal correlations of service usages

We investigate correlations between users’ spatiotemporal trajectories and their service usage patterns. Here five services, such as web domain visit (web), application (app), email, voice call (call), and short message service (SMS), are considered and each service is denoted by s. The spatiotemporal correlation of service usages for user i is fully characterized by the number of events corresponding to the service s in the cell c and at time t, denoted by n i s ( c , t ) . For gaining contextual understanding of correlations we consider the contexts instead of cells, i.e. n i s ( C , t ) = c n i s ( c , t ) , where the summation is over c detected as context C.

4.1 Contextual correlations of service usages

We first focus on the contextual correlations of service usages with n i s ( C ) = t n i s ( C , t ) . Since services have qualitatively different characteristics, the numbers of events of different services cannot be directly compared to each other but only in terms of fractions and intensities of usages. The fraction of service usage is defined as follows
f i s ( C ) = n i s ( C ) C n i s ( C ) .
(7)
Figure 8 (left) shows the fractions for sample users 5 and 81 as well as their means over all users with standard errors, measured by the bootstrap method. The handset of user 5 has never been abroad and no Other context is detected. For this user all service usages are more active at Home and Office than at Else, which is very different from the service usage patterns of user 81. Due to the diversity of the service usage patterns among users, any general conclusion cannot be made on the individual basis. However, by looking at the means with standard errors, it is found that all service usages are the most active at Home, while they are relatively inactive for other contexts. Given the aggregate durations for different contexts obtained in the Section 3, this finding can be explained such that the longer duration for some context means the higher chance for service usage.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig8_HTML.jpg
Figure 8

Contextual correlations of service usages. Contextual correlations of service usages are measured for users 5 and 81 and for all users (from top to bottom). Fractions (left) and intensities (right) of service usages are defined in Eqs. (7) and (8), respectively. Standard errors are also provided for the user-averaged statistics.

Accordingly, instead of the fractions of service usages we consider those divided by the corresponding durations as follows:
I i s ( C ) = n i s ( C ) C n i s ( C ) C d i C d i C ,
(8)

where d i C denotes the duration of user i for context C. The results are shown in Figure 8 (right). Despite of the diversity among users, the means of intensities of different services for the same context have to some extent similar values. The large mean of intensity of email usage in Office might be due to the fact that users prefer emails to calls or SMSs in classes or laboratories during the working time. The large mean of intensity of web usage at Else could be the result of users killing time by surfing the webpages while on the move. One could also say that users while abroad tend to use SMSs more than other communication services. Finally, for all services, only the means of intensity at Home turn out to be less than 1 and most inactive, which could be partly because users have many other activities to do at Home.

4.2 Temporal correlations and time-ordering of service usages

We turn to analyze the temporal correlations of service usages in terms of n i s ( t ) = C n i s ( C , t ) , where the summation is over all contexts with one exception, Abroad. It is because the service usage abroad cannot be considered as normal, as shown in Figure 8. We first obtain weekday and weekend patterns of service usages as
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Equ9_HTML.gif
(9)
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Equ10_HTML.gif
(10)
for 0 t < T d with T d = 1 day. Here k and k denote the indexes of weekdays and weekends, respectively. The weekday and weekend event rates of service s for user i are defined as
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Equ11_HTML.gif
(11)
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Equ12_HTML.gif
(12)

where a = 1 / 5 and a = 1 / 2 are weights for normalization. In addition we obtain the weekday and weekend event rates averaged over all users.

In Figure 9 we show the individual event rates for sample users 5 and 81 as well as the event rates averaged over all users. The overall behavior of the individual and user-averaged event rates reflects typical weekly cycles of humans by being more active in the daytime and on weekdays and less active in the nighttime and on weekends. From the user-averaged event rates, we find that email (call) is more used around noon (late afternoon) on weekdays, while email (call) is less (more) used than other services in the weekend daytime. Since most users in our dataset were students and staff members of the university, they might not be making or receiving calls in classes or laboratories in the weekday daytime. Instead they might be using other communication services, such as email and SMS. On the other hand, users might be using call more than email outside class or laboratory on weekends.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig9_HTML.jpg
Figure 9

Weekday and weekend service usage patterns. Weekday and weekend service usage patterns are measured for users 5 and 81 and for all users (from top to bottom), showing the similarity and diversity among users. The bin size was set to one hour.

To investigate the temporal correlations between service usages for each user, we calculate the Pearson correlation coefficient (PCC) by using the event rates of services s and s for user i:
r i , s s = t [ ρ i s ( t ) ρ ¯ i s ] [ ρ i s ( t ) ρ ¯ i s ] t [ ρ i s ( t ) ρ ¯ i s ] 2 t [ ρ i s ( t ) ρ ¯ i s ] 2 ,
(13)
where ρ ¯ i s = T d 1 t ρ i s ( t ) . For the PCC on weekdays and on weekends, ρ i s wd ( t ) and ρ i s we ( t ) are used, respectively. The values of PCC turn out in most cases to be positive (not shown here). This is mainly due to the typical weekly cycles of humans as mentioned before. To correct such cycles, for each case of weekdays and weekends we consider de-seasoned event rates defined as
Δ ρ i s ( t ) = ρ i s ( t ) 1 S i s ρ i s ( t ) ,
(14)

where S i denotes the number of services the user i have used.

As shown in Figure 10, the values of PCC obtained for the de-seasoned event rates show similar and distinct behavior among users as well as between weekdays and weekends. For example, in the case of user 5, the strongly positive correlation between call and SMS usages on weekdays turns to be slightly negative on weekends. This result is consistent with the temporal patterns depicted in Figure 9. The positive (negative) correlation between services by being used at the same time (at different times) of the week can be interpreted such that those services are complementary (substitutive) with each other [38]. Then, we obtain and compare distributions of PCC over all users for each pair of services. The mean values for web-app and call-SMS pairs (app-email pair) are slightly positive (negative) on weekdays and become slightly negative (positive) on weekends. All other pairs have the negative mean values. The result for positive correlations is inconclusive due to the large standard errors of PCC up to 0.05. However, for the pairs of services with large negative correlations, such as web-call and web-SMS pairs, we can argue that those services might be used in a substitutive way. In order to compare the correlations for weekdays and for weekends, we have conducted the Kolmogorov-Smirnov test. It is found that the distributions of PCC for weekdays and for weekends are significantly different for the pairs of web-app (p-value less than 0.005), app-email (0.03), email-call (0.03), email-SMS (0.03), and call-SMS (0.02). This list of pairs contains all the pairs whose sign of the mean has changed from weekdays to weekends.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig10_HTML.jpg
Figure 10

Pearson correlation coefficients among service usages. Pearson correlation coefficients among service usages for users 5 and 81 and for all users (from top to bottom) are obtained from weekday (left) and weekend (right) event rates. Positive and negative correlations are represented by orange and gray lines, respectively.

For more detailed, i.e. event-based analysis of correlations among service usages, we obtain the distribution of time interval between two consecutive or simultaneous events but of different services of the same user. Precisely, the time interval for a pair of services s and s is defined by Δ t s s = t s t s with event timings t s and t s . As shown in the upper panels of Figure 11(a), distributions for some service pairs have a peak at the negative value of Δ t s s both for weekdays and for weekends. This indicates that the event of service s follows that of service s . On the other hand, distributions for other pairs of services do not show any distinct peaks, implying no temporal correlation. This time-ordering behavior could mean that one service usage might effectively induce another service usage. However, we cannot investigate such a process by our dataset. We summarize the results such that communication services, such as email, call and SMS, are followed by non-communication services, i.e. web and app, as depicted in Figure 11(b). We also obtain the distributions of time interval for different contexts. We find the overall similar time-ordering behavior (not shown here), except that email is followed by web at Home and that app does not follow communication services abroad. Note that the event-based analysis cannot be directly compared to the analysis of aggregated weekly patterns.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig11_HTML.jpg
Figure 11

Time-ordering behavior between services. (a) Distributions of time interval Δ t s s between consecutive events of different services s and s . (b) Diagram for time-ordering behavior between services based on the distributions of time interval.

4.3 Clustering and overlaps in temporal patterns of service usage

As it turns out, the temporal patterns of service usage are diverse from one user to another, while some of them still show similar behavior. To investigate the similarity and diversity of weekly patterns for each service we apply the k-means clustering method [39] to the weekly event rates as ρ i s ( t ) { ρ i s wd ( t ) , ρ i s we ( t ) } . To correct the typical weekly cycles of each service (not of each user), we use the de-seasoned event rates as follows
Δ ρ i s ( t ) = ρ i s ( t ) 1 N s i ρ i s ( t ) ,
(15)

where N s denotes the number of users showing any activity in service s. We similarly define the service-averaged event rates for each user for the clustering, to be denoted by avg. In each case we set the number of clusters as k = 10 and the cluster index is denoted by q = 0 , , 9 . Clustering has been conducted 2,000 times with different initial conditions and here we present the result maximizing the quality of clustering or validity index, defined as the minimum inter-cluster distance divided by the sum of intra-cluster distances [39].

The clustering results are summarized in Table 1 and only a few weekly patterns of dominant clusters are shown in Figure 12. Only one dominant cluster is found in each case of web and email usages, implying similar patterns among users. Weekly patterns of app, call, and SMS usages are clustered into more than one dominant cluster. Compared to the largest cluster ( q = 0 ) of call usage, the second largest cluster ( q = 1 ) can be characterized by larger activities in the weekday daytime and in the weekend morning. The behavioral difference between dominant clusters in SMS usage is also obvious. The largest cluster ( q = 0 ) represents the evening-type users, while the second largest cluster ( q = 1 ) does the morning-type users on weekdays. In the case of service-averaged usage patterns, the second largest cluster ( q = 1 ) shows the larger (smaller) activity in the daytime on weekdays (on weekends) than the largest cluster ( q = 0 ). To check the validity of clustering results, we obtain the Pearson correlation matrices using the de-seasoned event rates, Δ ρ i s ( t ) . All the matrices support the k-means clustering results, see Figure 13. We also tested the effect of the number of means, k, on the clustering and found that the results are qualitatively similar apart from the number of small or outlying clusters.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig12_HTML.jpg
Figure 12

k -means clustering results of users’ weekly patterns. We have used k = 10 and plotted only a few dominant clusters with cluster size in the parenthesis, see Table 1 for details. The bin size was set to one hour.

https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig13_HTML.jpg
Figure 13

Pearson correlation matrices of users’ de-seasoned event rates. These matrices support the validity of the k-means clustering results in Table 1. The user index has been sorted according to the corresponding cluster index and blank spaces are due to totally inactive users.

Table 1

k -means clustering results for weekly patterns of service usages

Service

q = 0

1

2

3

4

5

6

7

8

9

N s

web

74

9

7

6

5

3

3

2

1

1

111

app

50

32

10

7

6

6

5

4

3

1

124

email

55

3

3

2

1

1

1

1

1

1

69

call

54

40

14

5

4

1

1

1

1

1

122

SMS

74

14

11

9

5

4

3

1

1

1

123

avg

64

21

16

6

5

5

4

1

1

1

124

We summarize k-means clustering results for weekly patterns of service usages with k = 10. q and N s denote the cluster index and the number of available users for service s, respectively.

Finally, in order to get insight into the overall structure of temporal correlations among users and services, we construct an overlap network based on the clustering results. This leads to the network of overlapping communities [40], where nodes and link weights of the network represent users and their overlaps, respectively. Precisely, the behavioral overlap is defined as the number of services in which two users, say i and j, belong to the same cluster as
O i j B = s δ ( q i s , q j s ) .
(16)
Here q i s denotes a cluster index for user i’s service s, and the Kronecker delta function δ ( q , q ) gives 1 if q = q and 0 otherwise. Figure 14 shows the overlap network with 436 links of O B = 4 and 5. The behavioral overlap O B = 5 of a link, denoted by thick black line, implies that the neighboring users belong to the same clusters for all services, i.e. they are fully synchronized. We find cliques consisting of only the fully synchronized users, which we call synchronized cores. The largest synchronized core with 9 users is closely related to the second largest synchronized core except for belonging to different clusters of call usage. These cores are also connected to many other users but not as a synchronized core. This agglomerate structure can be induced by the relatively homogeneous demographics of users in our dataset. However, we like to note that the clustering was applied to the de-seasoned event rates, which have been subtracted by the user-averaged temporal behavior.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig14_HTML.jpg
Figure 14

Overlap network constructed based on the clustering results for all services. Circle, square, and hexagonal nodes represent female, male, and unknown gender of users, respectively. Each black solid thick line denotes a link between users who belong to the same clusters for all services. Other colored lines denote the links between users who belong to the same clusters for all but one service: web (dashed thick blue), app (dotted thin red), email (dotted thick green), call (solid thick cyan), SMS (dashed thin violet), or due to the unused service by either user (solid thin gray). This figure was generated using Cytoscape v2.8.1 [41].

We compare the behavioral overlap network based on the clustering results to the communication network of users. The communication network can be constructed from the call and SMS datasets containing the information on communication partners. Only 67 out of 124 users and 205 links between users are identified. The topological overlap of a link ij is defined as [6]
O i j T = | Λ i Λ j | | Λ i Λ j | 2 ,
(17)
where Λ i denotes the set of neighbors of node i. O i j T has a value of 1 if i and j have exactly the same neighbors except for themselves and it has a value of 0 if they do not have any neighbors in common. Figure 15 shows the overall positive correlation between behavioral and topological overlaps. It implies that connected users sharing more common neighbors show more similar weekly patterns of service usages. Thus, the behavioral overlap network based on the service usages can be used to reveal the communication network structure of users.
https://static-content.springer.com/image/art%3A10.1140%2Fepjds10/MediaObjects/13688_2012_Article_9_Fig15_HTML.jpg
Figure 15

Topological overlap as a function of behavioral overlap. We observe the overall positive correlation between topological overlap from communication network of users and behavioral overlap based on the clustering results.

5 Summary

We have investigated spatiotemporal correlations and temporal diversities of service usages by analyzing a handset-based dataset collected from 124 users for over 16 months. The dataset consists of locations and service usages. After constructing the precise spatiotemporal trajectory for each user based on the location dataset, we identify several meaningful places or contexts by means of context detection method. As contexts, Home, Office, Other meaningful place, Elsewhere, and Abroad are considered. We showed how the context affects the service usage patterns of users, including their web domain visit (web), application (app), email, voice call (call), and short message service (SMS).

In this study we have found the similarity and diversity of weekly patterns among users and services, in terms of temporal correlations, time-ordering behavior between services, and overlap network based on clustering. The services used at the same time (at different times) of the week lead to the positive (negative) correlations between them, which can be interpreted as being complementary (substitutive) to each other. By conducting the event-based analysis instead of weekly patterns we observe the time-ordering behavior between services, such that communication services, i.e. email, call, and SMS, are followed by the non-communication services, i.e. web and app. Finally, the similarity and diversity of weekly patterns of service usages enable us to classify users into several different clusters, e.g. as characterized by the morning-type or evening-type usage patterns, except for the web and email usages. The behavioral overlap network constructed based on the clustering results can be used to reveal the communication or real social network structure of users.

Our findings on the spatiotemporal correlations of service usage patterns for different contexts enable us to better understand the behavior of humans and what that implies. This is also important for better design of information and communications technology (ICT) enabled social environments and services. However, more detailed analysis with higher resolution is required to reveal the underlying mechanism or the origin of spatiotemporal correlations.

Declarations

Acknowledgements

The research data were collected in the OtaSizzle project that is funded by Aalto University’s MIDE program and Helsinki University of Technology TKK’s ‘Technology for Life’ campaign donations from private companies and communities. The authors thank MobiTrack Innovations Ltd. for providing the mobile audience measurement platform. The sponsoring from Nokia and Elisa to this work is also acknowledged. Financial support by Aalto University postdoctoral program (HJ), from EU’s 7th Framework Program’s FET-Open to ICTeCollective project no. 238597, by the Academy of Finland, the Finnish Center of Excellence program 2006-2011, project no. 129670 (MK, KK), and by Future Internet Graduate School and MoMIE project (JK) are gratefully acknowledged.

Authors’ Affiliations

(1)
Department of Biomedical Engineering and Computational Science, School of Science, Aalto University
(2)
Department of Communications and Networking, School of Electrical Engineering, Aalto University

References

  1. Goyal S: Connections: an introduction to the network economy. Princeton University Press, Princeton; 2009.Google Scholar
  2. Castellano C, Fortunato S, Loreto V: Statistical physics of social dynamics. Rev Mod Phys 2009,81(2):591–646. 10.1103/RevModPhys.81.591View ArticleGoogle Scholar
  3. Lazer D, Pentland A, Adamic L, Aral S, Barabási AL, Brewer D, Christakis N, Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Van Alstyne M: Computational social science. Science 2009,323(5915):721–723. 10.1126/science.1167742View ArticleGoogle Scholar
  4. Eckmann JP, Moses E, Sergi D: Entropy of dialogues creates coherent structures in e-mail traffic. Proc Natl Acad Sci USA 2004,101(40):14333–14337. 10.1073/pnas.0405728101MATHMathSciNetView ArticleGoogle Scholar
  5. Barabási AL: The origin of bursts and heavy tails in human dynamics. Nature 2005, 435: 207–211. 10.1038/nature03459View ArticleGoogle Scholar
  6. Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D, Kaski K, Kertész J, Barabási AL: Structure and tie strengths in mobile communication networks. Proc Natl Acad Sci USA 2007,104(18):7332–7336. 10.1073/pnas.0610245104View ArticleGoogle Scholar
  7. Kwak H, Lee C, Park H, Moon S: What is Twitter, a social network or a news media. In Proceedings of the 19th international conference on World Wide Web, WWW ’10. ACM, New York; 2010:591–600.View ArticleGoogle Scholar
  8. Lewis K, Kaufman J, Gonzalez M, Wimmer A, Christakis N: Tastes, ties, and time: a new social network dataset using Facebook.com. Soc Netw 2008,30(4):330–342. 10.1016/j.socnet.2008.07.002View ArticleGoogle Scholar
  9. Kovanen L, Karsai M, Kaski K, Kertész J, Saramäki J: Temporal motifs in time-dependent networks. J Stat Mech Theory Exp 2011.,2011(11):P11005View ArticleGoogle Scholar
  10. Jo HH, Karsai M, Kertész J, Kaski K: Circadian pattern and burstiness in mobile phone communication. New J Phys 2012., 14:013055Google Scholar
  11. Karsai M, Kaski K, Barabási AL, Kertész J: Universal features of correlated bursty behaviour. Sci Rep 2012., 2: 397Google Scholar
  12. González MC, Hidalgo CA, Barabási AL: Understanding individual human mobility patterns. Nature 2008,453(7196):779–782. 10.1038/nature06958View ArticleGoogle Scholar
  13. Candia J, González MC, Wang P, Schoenharl T, Madey G, Barabási AL: Uncovering individual and collective human dynamics from mobile phone records. J Phys A, Math Theor 2008.,41(22):224015View ArticleGoogle Scholar
  14. Wang P, González MC, Hidalgo CA, Barabási AL: Understanding the spreading patterns of mobile phone viruses. Science 2009,324(5930):1071–1076. 10.1126/science.1167053View ArticleGoogle Scholar
  15. Song C, Qu Z, Blumm N, Barabási AL: Limits of predictability in human mobility. Science 2010,327(5968):1018–1021. 10.1126/science.1177170MATHMathSciNetView ArticleGoogle Scholar
  16. Song C, Koren T, Wang P, Barabasi AL: Modelling the scaling properties of human mobility. Nat Phys 2010,6(10):818–823. 10.1038/nphys1760View ArticleGoogle Scholar
  17. Eagle N, Pentland A: Reality mining: sensing complex social systems. Pers Ubiquitous Comput 2006,10(4):255–268. 10.1007/s00779-005-0046-3View ArticleGoogle Scholar
  18. Eagle N, Pentland AS, Lazer D: Inferring friendship network structure by using mobile phone data. Proc Natl Acad Sci USA 2009,106(36):15274–15278. 10.1073/pnas.0900282106View ArticleGoogle Scholar
  19. Krings G, Calabrese F, Ratti C, Blondel VD: Urban gravity: a model for inter-city telecommunication flows. J Stat Mech Theory Exp 2009.,2009(7):L07003Google Scholar
  20. Bagrow JP, Lin YR: Mesoscopic structure and social aspects of human mobility. PLoS ONE 2012.,7(5): e37676View ArticleGoogle Scholar
  21. Aharony N, Pan W, Ip C, Khayal I, Pentland A: Social fMRI: investigating and shaping social mechanisms in the real world. Pervasive Mob Comput 2011,7(6):643–659. 10.1016/j.pmcj.2011.09.004View ArticleGoogle Scholar
  22. Falaki H, Mahajan R, Kandula S, Lymberopoulos D, Govindan R, Estrin D: Diversity in smartphone usage. In Proceedings of the 8th international conference on mobile systems, applications, and services, MobiSys ’10. ACM, New York; 2010:179–194.Google Scholar
  23. Soikkeli T, Karikoski J, Hammainen H: Diversity and end user context in smartphone usage sessions. In Next generation mobile applications, services and technologies (NGMAST), 2011 5th international conference on. IEEE Press, New York; 2011:7–12.Google Scholar
  24. Dey AK: Understanding and using context. Pers Ubiquitous Comput 2001, 5: 4–7. 10.1007/s007790170019View ArticleGoogle Scholar
  25. Verkasalo H (2009) Handset-based analysis of mobile service usage. PhD thesis, Helsinki University of Technology, Espoo, FinlandGoogle Scholar
  26. Soikkeli T (2011) The effect of context on smartphone usage sessions. Master’s thesis, Aalto University, Espoo, Finland.http://aalto-fi.academia.edu/TapioSoikkeli/PapersGoogle Scholar
  27. Karikoski J, Soikkeli T: Contextual usage patterns in smartphone communication services. Pers Ubiquitous Comput 2011. doi:10.1007/s00779–011–0503–0Google Scholar
  28. OtaSizzle project. http://sizl.org OtaSizzle project. http://sizl.org
  29. Montoliu R, Perez DG: Discovering human places of interest from multimodal mobile phone data. In Proceedings of the 9th international conference on mobile and ubiquitous multimedia, MUM ’10. ACM, New York; 2010.Google Scholar
  30. Nurmi P, Koolwaaij J: Identifying meaningful locations. Mobile and ubiquitous systems: networking and services, 2006 third annual international conference on 2006, 1–8.Google Scholar
  31. Eagle N, Pentland AS: Eigenbehaviors: identifying structure in routine. Behav Ecol Sociobiol 2009,63(7):1057–1066. 10.1007/s00265-009-0739-0View ArticleGoogle Scholar
  32. Reades J, Calabrese F, Ratti C: Eigenplaces: analysing cities using the space-time structure of the mobile phone network. Environ Plan B, Plan Des 2009,36(5):824–836. 10.1068/b34133tView ArticleGoogle Scholar
  33. Park J, Lee DS, González MC (2010) The eigenmode analysis of human motion. J Stat Mech Theory Ex 2010(11):P11021View ArticleGoogle Scholar
  34. Karikoski J (2012) Handset-based data collection process and participant attitudes. Int J Handheld Comput Res (inpress)Google Scholar
  35. HIIT OpenNetMap project. http://opennetmap.rista.fi/ HIIT OpenNetMap project. http://opennetmap.rista.fi/
  36. OpenCellID. http://www.opencellid.org OpenCellID. http://www.opencellid.org
  37. Location-API. http://location-api.com Location-API. http://location-api.com
  38. Karikoski J, Luukkainen S: Substitution in smartphone communication services. In Intelligence in next generation networks (ICIN), 2011 15th international conference on. IEEE Press, New York; 2011:313–318.View ArticleGoogle Scholar
  39. Gan G, Ma C, Wu J: Data clustering: theory, algorithms, and applications. SIAM, Philadelphia; 2007. illustrated edn illustrated ednView ArticleGoogle Scholar
  40. Palla G, Derényi I, Farkas I, Vicsek T: Uncovering the overlapping community structure of complex networks in nature and society. Nature 2005,435(7043):814–818. 10.1038/nature03607View ArticleGoogle Scholar
  41. Smoot ME, Ono K, Ruscheinski J, Wang PLL, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 2011,27(3):431–432. 10.1093/bioinformatics/btq675View ArticleGoogle Scholar

Copyright

© Jo et al.; licensee Springer. 2012

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.