Putting human behavior predictability in context

Various studies have investigated the predictability of different aspects of human behavior such as mobility patterns, social interactions, and shopping and online behaviors. However, the existing researches have been often limited to a single or to the combination of few behavioral dimensions, and they have adopted the perspective of an outside observer who is unaware of the motivations behind the specific behaviors or activities of a given individual. The key assumption of this work is that human behavior is deliberated based on an individual’s own perception of the situation that s/he is in, and that therefore it should also be studied under the same perspective. Taking inspiration from works in ubiquitous and context-aware computing, we investigate the role played by four contextual dimensions (or modalities), namely time, location, activity being carried out, and social ties, on the predictability of individuals’ behaviors, using a month of collected mobile phone sensor readings and self-reported annotations about these contextual modalities from more than two hundred study participants. Our analysis shows that any target modality (e.g. location) becomes substantially more predictable when information about the other modalities (time, activity, social ties) is made available. Multi-modality turns out to be in some sense fundamental, as some values (e.g. specific activities like “shopping”) are nearly impossible to guess correctly unless the other modalities are known. Subjectivity also has a substantial impact on predictability. A location recognition experiment suggests that subjective location annotations convey more information about activity and social ties than objective information derived from GPS measurements. We conclude the paper by analyzing how the identified contextual modalities allow to compute the diversity of personal behavior, where we show that individuals are more easily identified by rarer, rather than frequent, context annotations. These results offer support in favor of developing innovative computational models of human behaviors enriched by a characterization of the context of a given behavior.

Research studies have also highlighted how similar mechanisms seem to govern different human activities. For example, people show a finite number of favourite places [6] and friends [11]. In a similar way, some individuals tend to explore and change favourite places [16] over time, as they do with friendships [11] and mobile phone apps [17], while others tend to maintain stable their behavior.
However, existing studies on human dynamics have been often limited to a single or to the combination of few behavioral dimensions (e.g. mobility and social interactions) [2,3,[18][19][20]. Moreover, these studies have adopted the perspective of an outside observer who is unaware of the motivations behind the activities of a given individual.
In our work, we propose a different angle for analyzing the predictability of human behavior. In particular, our study revolves around the observation that, in typical circumstances, human behavior is deliberated based on an individual's own perception of the situation s/he is involved in, as captured by the notion of personal context [21][22][23]. For this reason, we analyze regularity and diversity in behavior through the joint interplay of four modalities of personal context (i.e. time, location, activity, and social ties) widely used in context-aware and ubiquitous computing communities [21][22][23][24][25].
In particular, we perform a rigorous statistical analysis of the effects of these four modalities of personal context on the predictability of human behavior using a month of collected mobile phone sensor readings and self-reported annotations about time, location, activity and social ties from more than 200 volunteers [26,27]. Our analysis leverages information theoretic techniques introduced by studies on human mobility [3,15,28] to characterize the predictability of individual behavior for single modalities and extends them to study correlations across distinct modalities. In addition, we look at behavior diversity across individuals through the lens of the four identified contextual modalities.
Our analyses and findings offer several pieces of evidence in support of the role played by the investigated contextual modalities. As a first step, we have estimated the performance of an ideal, optimal classifier for independently predicting each modality (i.e. time, location, activity, and social ties) for each individual in the data set. This showed that an optimal classifier with access to the previous annotations for the same user and contextual modality, but not their chronological order, cannot do better than 45% to 65% accuracy. In other words, ignoring correlations across time and between contextual modalities entails a large irreducible error of 35% to 55%, depending on the target modality. Disclosing the order of past annotations (again, available for the target modality only) makes the optimal classifier performs much better and the irreducible error decreases to 10% to 15%. However, supplying the optimal classifier with information about the other modalities (e.g. providing time, activity, and social ties while predicting location) but not their order decreases the irreducible error even more, below 5%. This shows that taking inter-modality correlations into account makes a substantial difference in the predictability of an individual behavior and supports the idea that inter-modality correlations may be more important than short-and long-term correlations over time. These results, which hold for optimal classifiers, were shown to carry over to practical classifiers (namely, Random Forests) in a location recognition experiment. This experiment also shows that some locations that are hard or impossible to predict using sensor data suddenly become easy to predict when information from time, activity, and social ties is taken into account. This further highlights the fundamental importance of the jointly interplay of different contextual modalities in behavior analysis.
Then, the analysis was extended to determine the impact of subjectivity on predictability of behavior. Consistently with our finding that activity and location are strongly tied, we compared the impact of injecting objective versus subjective location information on the performance of optimal classifiers for activity and social ties. Here, subjective location was implemented using self-reported annotations, while objective location was derived from GPS measurements. Our results support the argument that subjective location is far more informative than objective location for predicting behavior.
In a final experiment, we investigated the role played by the identified four contextual modalities in studying behavior diversity across individuals. The goal of this experiment was to determine whether common or uncommon behaviors are what distinguishes different individuals. The results clearly show that, first of all, the context distribution is heavy tailed, and therefore that contextual modalities offer support for analyzing "rare" behaviors, and second that annotations in the tail of the context distribution are much more effective than those in the head at identifying individuals. This was verified in a practical identity recognition experiment.

Related work
In this section we review key works related to our paper from two distinct research areas: (i) capturing and modeling contextual information and subjective experiences using mobile sensing approaches, and (ii) modeling predictability, entropy and diversity of human behaviors.
A decade ago, Lane et al. [42] surveyed a number of existing studies on mobile phone sensing algorithms, applications, and systems and pointed out that how to characterize contextual information is one of the most challenging research problems in the mobile sensing community. More in general, ubiquitous computing and context-aware computing researchers have produced several works on modeling and characterizing the contextual dimensions of human behaviors and activities [18,19,21,43].
For example, various studies have leveraged the Experience Sampling Method (ESM) [44] approach on mobile devices to capture self-reported contextual information on the daily activities and routines of people [45][46][47]. ESM is a methodology aiming at collecting information on behaviors and feelings of study participants throughout their daily activities [44]. As in traditional diary studies, ESM collects data by means of study participants' self-reports; however, study participants, unlike in diary studies, are proactively triggered at various moments during the day. Along this line, a group of ubiquitous computing researchers has designed Aware [48,49], a platform for context-aware mobile sensing that captures different contextual information such as time, location and proximity interactions.
In our current work, we took inspiration from these previous efforts and we focus on collecting information on four contextual modalities often investigated in the past, namely time, location, activity, and social ties. However, we advance the state-of-the-art performing a rigorous and extensive analysis of the joint effects of these contextual modalities on the regularity and diversity of human behaviors. Doing this, we merge contributions from ubiquitous and context-aware computing communities with computational social science approaches characterizing the predictability of human behaviors by means of information theoretic measures [3,15,28].

Previous studies on human predictability
In addition to studies and methods focused on capturing and modeling contextual information relevant for understanding human behaviors, there is a large body of work on human predictability. Notably, these studies are canonically split into topics based on the modality and data being considered: researches on human mobility look at location data [2,3,6,8,15,28], studies on behavioral routines analyze activities and recurrent patterns [9,50], and finally work on social networks investigates the role played by social relations and interactions [24]. Only few works consider combinations of distinct modalities, such as location and social ties [6,[18][19][20], or look at the predictability of human behavior at a more general level [9]. Here, we make use of the statistical and information theoretic measures developed mainly in studies on modeling human mobility [2,3,15], while extending them to the analysis of multiple modalities. Our experiments show that the four contextual modalities identified are fundamental for determining, and thus for analyzing, predictability and diversity of human behaviors.

Materials and methods
Given the variety and complexity of individual experiences, formalizing context in its entirety is essentially impossible, and application-specific or study-specific solutions are necessary. In our paper, we focus on four modalities of context-time, location, activity, and social ties-widely used in ubiquitous computing communities for capturing and describing situations occurring in everyday life [21][22][23][24][25].
Here, we illustrate these contextual modalities using a simple university life scenario, in which a student is attending a lecture at the University of Trento, at 11:00 AM, together with a friend named "Shen".
Formally, the context can be represented as a tuple: where: • TIME is the temporal context: It answers "What TIME is it?" and encodes the time in which that context was observed, e.g. "morning"; • WE is the endurant context and answers "WhEre are you?" It indicates the relevant location that a person is at, e.g. "classroom"; • WA is the perdurant context and answers "WhAt are you doing?" It refers to the main activity taking place, e.g. "lesson"; • WO is the social context and answers "WhO are you with?" It covers all the relevant people in the current context, e.g. "teacher", "classmates", and "Shen".

Data collection
The data was collected as part of the Smart UniTn 2 project, which lasted from the 7th of May to the 7th of June 2018, for a total of four weeks (32 days) [27]. The research protocol was designed on top of a prior analogous, but slightly smaller, study [26]. More precisely, following the research protocol an e-mail inviting participation in the data collection was sent to all 12,000 regularly enrolled students at the University of Trento. The e-mail clearly explained that students could choose to participate in the study for two or four weeks, and that in the first two weeks they would receive a notification every half hour, while in the second two weeks every two hours. Moreover, as stated by Keusch et al. [51], the willingness to participate in mobile data collection is strongly influenced by the incentive promised for study participation. Thus, a monetary incentive was introduced to encourage prompt and truthful reporting. A reward of 20 euros was promised to each participant every two weeks. In addition, each participant was informed that at the end of the survey there would be a lottery among those who responded to more than 75% of the notifications consisting of 3 prizes of 100 euros for the first two weeks and 3 prizes of 150 euros for the second two weeks. From the 1089 volunteers, a stratified random sample of 237 students from 10 different departments at the University of Trento, Italy, was invited to participate in the survey. Following the Italian regulations, all participants were asked to sign informed consent forms and the study was conducted in accordance to them. The research protocol and the informed consent forms were also approved by the Ethical Committee of the University of Trento.
The data was logged using the i-Log app [52], which all volunteers were required to install on their mobile phones. The app records measurements from several sensors, both hardware (e.g. GPS, accelerometer, gyroscope) and software (e.g. applications running on the device). Table 1 lists all sensors with their frequencies and units of measurement. The app was also used to track the personal context of each study participant (namely their current activity, location, and social context) by periodically administering questionnaires. Figure 1 reports the questions appearing in each questionnaire and the set of possible answers. The participants had 150 minutes (2.5 h) since submission to answer a questionnaire. If a study participant failed to timely reply to five consecutive questionnaires, the oldest one was dropped and the answer treated as a missing value.
As previously said, the data collection was split in two phases, each two weeks long. In the first phase (7th to 24th of May) questionnaires were submitted to the volunteers every 30 minutes, while in the second one (25th of May to 7th of June) every 2 hours, to lessen the cognitive load. In this second stage, the volunteers were also specifically requested to leave the app running at all times.

Data preprocessing
Despite these precautions, the self-reported annotations are not unlikely to be noisy and biased. This is compatible with earlier observations about a similar collection experiment [53]. In order to minimize the remaining bias, the raw annotations were cleaned as follows. In a first step, a simple criterion was used to identify valid (that is, "trustworthy") study participants. A participant was deemed valid if s/he failed to reply no more than 7 times within any 10-hour window, completed all questionnaires for at least 13 days, and provided at least 300 valid answers. All of these conditions must hold for a study participant to be Once every minute Unitless Figure 1 The questionnaire used in the Smart UniTn 2 project deemed valid. A total of 184 study participants were marked as valid. The records of all invalid study participants were discarded. The next step was to delete events with invalid or missing values (like empty string labels) and records spuriously occurring before the  Fig. 2, shows that most participants have in-between 400 and 1000 records. To get an intuition of the regularity in the behavior of volunteers, we selected two participants with the highest and lowest annotation diversity and visualize their annotations in Fig. 3. The most regular study participant has a distinctly simpler behavior than the other one, as expected. The figure also shows that even the behavior of the more regular volunteer is still quite irregular and displays substantial variability across days and across weeks.

Results
We organize this section in several subsections. First of all (Sects. 4.1, 4.2, and 4.3), we analyze each contextual modality in isolation. Second (Sects. 4.4 and 4.5), we study the influence of the four modalities on one another. Then, empowered by the positive results of the previous sections we proceed the analysis by studying the impact of subjectivity on predictability (Sect. 4.6). We conclude (see Sects. 4.7 and 4.8) by providing evidence that the investigated contextual modalities are useful in computing the diversity of personal behavior across individuals. Figure 4 reports the distribution of annotations in the data. The plot shows that, for all contextual modalities, few values take up most of the mass. Roughly speaking, this means that study participants spend most of their time performing four basic activities (namely studying, sleeping, eating, and moving between locations, which account for about 55% of the records), mostly stay at home (either their home or their relatives, more than 50%), and mostly by themselves or with their friends (almost 50% and 16%, respectively). TIME is special in that its annotations are extremely regular and mostly determined by the experimental setup rather than by individual preferences. This is especially true for nocturnal annotations, as the user can set i-Log to "sleep mode" so that it will automatically reply to the questionnaires accordingly during the night. For this reason, TIME is omitted from the The profile transpiring from the data reflects the source demographics. 1 The concentration of mass on few preferred values is consistent with previous studies on mobility [3,6].

Intra-modal predictability
We are interested in understanding to what degree individual modalities are predictable and whether some modalities are intrinsically more predictable than others. In line with previous work [2,3,15,28], we answer these questions using entropy and predictability. We introduce these notions in turn.

Entropy and predictability
Entropy measures the number of bits necessary to encode a random source: an entropy of b bits indicates that, on average, an individual who chooses her/his next value (i.e. location, activity, or social tie) randomly according to the ground-truth distribution will be found in 2 b distinct states with high probability [54]. Hence, higher entropy implies higher uncertainty. In order to evaluate the contribution of different factors, consistently with previous studies [3,15], we estimated three forms of entropy: (1) The random entropy, defined as: where X u,m is a random variable that represents the value of modality m for individual u and N u,m is the number of distinct values observed for that modality and individual in the full data set. The random entropy assumes that the study participant is equally likely to choose any of the values that s/he has annotated.
(2) The time-uncorrelated or flat entropy, defined as: where the sum runs over all the possible values for modality m and Pr(X u,m = x) denotes the empirical probability that individual u reported value x for modality m, as estimated from the data. The flat entropy is more informed than the random entropy as it takes the full value distribution into account.
(3) The true entropy, defined as the limit of the joint entropy: Here  1 Considering that the volunteers are university students, the self-reported amount of studying is likely to be a (slight) over-estimate.
Compared to the flat entropy, the true entropy takes correlations over time, including short-and long-range correlations, into account. The true entropy is estimated from the data using the Lempel-Ziv estimator [55].
While entropy measures uncertainty, it only gives indirect information about how "easy to guess" a random source is. This is better captured by the notion of predictability, which was introduced to assess regularity of human mobility [3]. Formally, the predictability (X) ∈ [0, 1] of a random variable X is the accuracy of an optimal classifier for X, that is, the probability that this classifier outputs the correct value. As a consequence, if the predictability of a random variable is 0.8, then no classifier can have an accuracy higher than 80%-or, in other words, all classifiers must be mistaken 20% of the time. This means that predictability measures the irreducible error intrinsic in a random source. A notable property of the predictability is that, thanks to Fano's inequality [54], it can be derived directly from the entropy H by solving the equation: Here N is the number of distinct values that X can take. Please see [3] for a detailed derivation. For our goals, it suffices to know that, very intuitively, lower entropy entails higher predictability. In order to measure the effect of annotation distribution and correlations over time, the predictability of each individual u and modality m was obtained by solving Eq.  Figure 5 illustrates the distribution of entropy (left) and predictability (right) for each modality. The histograms show that while all modalities are to some extent regular, some are more regular than others. This is partly due to the fact that the theoretical maximum  of the entropy is log 2 N m , and it is controlled by the number of possible values N m for modality m. Hence, modalities with more states, like activity and location, are intrinsically more uncertain and less predictable than modalities with fewer states. In our setting, the theoretical maximum of the entropy (represented in the entropy plots by a green line) is about 4.4 for location, 4.3 for activity, and 3 for social tie. The plots show that entropy is largely determined by distributional information and short-and long-range correlations always impact the measured entropy: random entropy (blue) is always much higher than flat entropy (red), which is itself much higher than true entropy (purple). These changes in uncertainty demonstrate that taking annotation distribution and time correlations into account can substantially lower uncertainty and increase predictability. The same effect can be observed for all modalities, with some differences. For all entropy measures, the WA modality has the highest entropy, followed by WE and WO. However, the difference between modalities is more pronounced for the random and flat entropy, while it is limited for the true entropy, confirming the usefulness of taking time correlations into account. Figure 5 (right) shows predictability of each modality for the different types of entropy. Comparing these histograms with those on the left makes it clear that increasing the amount of information dramatically increases predictability, as expected. Table 2 reports means and standard deviations of empirical entropy and predictability of each modality and type of entropy. The predictability for the true entropy time (and hence maximal prediction accuracy) is 85% for activity, 89% for social tie, and 90% for location. This entails that irreducible error, even when taking all the available information into account, is about 10%-15% across modalities. The irreducible error for the flat entropy is even larger, 35%-55%.

Results for intra-modal entropy and predictability
The standard deviation of predictability-that is, the spread of the histogram-does considerably shrink as more information is taken into consideration. This points at the fact that, as more information is considered, all participants appear to act more predictably. It is worth noting that, however, the standard deviation of the true entropy is non-zero, hinting at the fact that some participants are intrinsically less predictable than others. This partially motivates our study of behavior diversity across individuals, presented later on.

Inter-modal predictability
So far, we have studied individual modalities taken in isolation. This approach is simplistic in that it neglects correlations between modalities, which we hypothesize to be very significant. In the following, we study the effect of inter-modal correlations on predictability. This is achieved by estimating the conditional entropy H(X u,m |X u,m ), which quantifies the number of bits b needed to encode a random source X u,m assuming that X u,m is known (with m = m). Intuitively, the more X u,m influences or determines X u,m , the lower the conditional entropy [54]. The conditional entropy is defined as: where H(X u,m |X u,m = x ) is the entropy of X u,m estimated only on those records that satisfy X u,m = x . An issue with conditioning is that it is incompatible with the full entropy H time , as it breaks time correlations: two non-consecutive records may appear to be consecutive in the conditional data set simply because they satisfy the same condition X u,m = x and none of the records in-between them does. This means that the conditional and unconditional entropy cannot be compared directly. 2 For this reason, in the following we use the flat, time-uncorrelated entropy H flat in all computations.
The reduction in flat entropy due to conditioning, averaged over all study participants, is illustrated in Fig. 6 (top). The green line represents the entropy prior to conditioning (as reported in Table 2), while the red bars represent the conditional entropy. The change in predictability is reported below in the same figure. The plots show very clearly that in all cases, inter-modal information substantially reduces uncertainty and improves predictability. 3 Indeed, conditioning any modality on the rest of the context (including TIME, rightmost bar in the plots) reduces entropy by more than 80% and increases predictability by at least 30%. More in detail, upon conditioning on the full context model, the entropy drops from 3.32 to 0.42 for WA, from 2.43 to 0.28 for WE, and from 1.82 to 0.29 for WO. At the same time, the predictability goes from 0.45 to 0.96 for WA, from 0.65 to 0.97 for WE, and from 0.67 to 0.97 for WO, cf. Table 2. This shows that the potential gain in accuracy from using multi-modal contextual dimensions is extremely large for all the modalities. The results for predictability make this point even clearer, as conditioning gives an impressive reduction of the irreducible error (that is, 1 -). In particular, the irreducible error of WA sees a huge drop from 55% to 4%, that of WE from 35% to 3%, and that of WO from 33% to 3%. This is consistent with our argument that time, location, activity, and social ties strongly influence each other, and provides empirical evidence in favor of our approach of taking into consideration all the four contextual dimensions.
The magnitude of entropy reduction is largely independent of the target modality: conditioning reduces entropy of WA by 84%, of WE by 86%, and of WO by 81%, and increases predictability by 160%, 133%, and 131%, respectively. At the same time, some modali-ties appear to carry more information than others: while conditioning on TIME shrinks entropy by only 15-20%, conditioning on WO, WA, and WE reduces entropy by 45%, 54-67%, and 59-77%, respectively. The four modalities can be ordered by average impact as TIME ≺ WO ≺ WA ≺ WE, meaning that TIME is the least informative modality and location the most informative one in the setting under investigation in this study. The largest impact is observed when conditioning activity on location or vice versa, although conditioning on multiple modalities makes this effect more noticeable.
Comparing these results, which refer to flat entropy and predictability and that therefore ignore correlations over time, with those for full entropy supports the idea that intermodal correlations are more influential than pure temporal correlations. Indeed, the full entropy of WA, WE, and WO reported in Table 2

Location prediction in practice
The above analysis shows that taking multiple contextual modalities into account helps to identify regularities in the behavior of individuals. Along this line, we also expect that some activities, locations, or social relationships cannot be predicted unless information from other modalities is available. Furthermore, while predictability measures the performance of an optimal classifier, it is important to study whether improvements in predictability due to conditioning affect the performance of real classifiers in practice.
To investigate this issue, we carried out a practical location prediction experiment. Specifically, we measured the difference in prediction performance between a prototypical statistical classifier [56] that predicts location from sensor measurements and that of analogous classifiers that were additionally given annotations about activity and/or social ties. As for the classifier, we opted for Random Forests due to their performance and reliability [57].
We trained one Random Forest classifier for each participant u. Each Random Forest takes as inputs the sensor measurements s u t of user u at time t-and optionally the annotations for the activity x u,WA t and social ties x u,WO t -and predicts the corresponding location x u,WE t . For simplicity, the sensor measurements s u t were restricted to features derived from GPS information, and in particular to longitude, latitude, and total distance traveled by the subject since the last questionnaire. This simple setup is sufficient for location prediction, and readings from the other sensors were found empirically to not be very relevant for the task at hand.
Prediction performance was evaluated using a 5-fold cross validation procedure. Namely, for each study participant, her/his records were randomly partitioned into 5 folds: one fold was used for performance evaluation while the remaining four were used for training the classifier. This step was repeated five times by varying the test fold. The performance of the Random Forest was taken to be the average over the five repeats.
For each user, we evaluated the impact of inter-modal annotations by comparing the performance of four classifiers: a baseline Random Forest that uses only GPS-derived inputs s u t and three Random Forests-with the very same depth-that were given also annotations for WA and/or WO as inputs. All hyper-parameters were kept to their default values. 4 except for forest depth, which was selected on a separate validation set to optimize the performance of the baseline Random Forest. In order to account for annotations skew (i.e. some locations are naturally more frequent than others), performance was measured using the macro F 1 score. The latter is simply the F 1 score of individual locations averaged over all locations.
The overall macro F 1 scores averaged across study participants, as well as a breakdown of the F 1 scores for individual locations, are reported in Fig. 7. The plots show that GPS information can predict reasonably well several locations (red bars), like "Home", "Relative's home", and "Library", among others, on which the baseline Random Forest achieves 40% F 1 score. We conjecture this to be partially due to the fact these locations are very specific-in our data, the home of most users is unique and often easily identified from even few examples-and partially due to the abundance of annotations for these locations, cf. Fig. 4. GPS information, however, is clearly insufficient for locations like "Shop/Supermarket/etc. ", "Theater/Museum/etc. ", "Gym", which are far more generic. Here the baseline Random Forest performs very poorly. This can be explained by two facts. First, these locations are composed of multiple objective locations (e.g. different shops, some of which possibly never observed during training), and therefore they are harder to predict based on GPS data alone. Second, the number of annotations for these locations is much lower.
Performance dramatically improves once WA and WO are supplied as inputs. In particular, the overall F 1 score increases by about 30%. Moreover, while knowledge of either WO or WA always helps recognition performance, supplying both improves performance even more, as expected. We also note that WA is more useful than WO in general. These observations are consistent with the results for the optimal classifier.
One question is whether these results are influenced by the performance of particularly easy to predict classes. We assessed this possibility by computing a variant of the macro F 1 that considers the median (rather than the mean) performance over classes, and as a result is naturally insensitive to classes that perform exceptionally well or exceptionally badly. The results are as follows: the macro mean F 1 for the four cases (sensors only, sensors with WO, sensors with WA, and sensors with WO and WA) is 0.19, 0.25, 0.42 and 0.47 respectively, whereas the macro median F 1 is 0.09, 0.13, 0.43 and 0.47. The more significant difference between macro mean and median F 1 appears when no activity information is present: the baseline drops by about 10% and the "with WO" Random Forest by 13%. However, the latter can be almost entirely explained by the former: adding social information contributes roughly +5% to both macro mean and median F 1 (from 0.19 to 0.25 and from 0.09 to 0.13, respectively). Summarizing, this shows that the macro mean F 1 overestimates the quality of the sensor-only baseline by about 10%. This probably occurs because all on-the-way locations like driving and walking are very hard to predict from sensors only (they individually achieve less than 8% F 1 ), meaning that the macro median F 1 tends to consider the higher-performing classes as outliers and ignores them. Most importantly, the contribution of inter-modal information to predictive performance is confirmed even by this more strict metric.
An important finding of this experiment is that some locations that were completely unpredictable from GPS data alone, are much easier to recognize if WA and WO annotations are supplied as inputs. The two most impressive examples are "Shop/Supermarket/etc. " and "Theater/Museum/etc. ", in which the correlation between location and activity boosts the F 1 score from less than 5% to more than 70%. This very encouraging result offers further support for the jointly leverage of different contextual modalities, as some locations that are essentially impossible to recognize suddenly become essentially trivial to recognize when rich contextual information is provided.

Subjectivity and predictability
Here, we investigate whether subjective annotations are more relevant than objective ones for determining predictability of behavior.
In particular, we compared the reduction in entropy due to conditioning on subjective location (namely, the WE annotations) to that due to conditioning on objective location, interpreted here in terms of GPS coordinates and related information. As in the location recognition experiment, we defined objective location using longitude, latitude, and total distance travelled since the last questionnaire. Computing the conditional entropy for continuous variables-in our case, the GPS coordinates-is not statistically straightforward. In order to avoid issues, we discretized the GPS information using a simple binning procedure. In particular, we allocated k = 3 equal size bins for each of the three dimensions (longitude, latitude, amount travelled), for a total of 27 values for the objective data. This is done by using the KBinsDiscretizer class provided by scikit-learn [58] using the "quantile" strategy, which ensures that all bins contain roughly the same number of points. The number of bins roughly matches the number of subjective values (i.e. locations), which are 22. Since the variance of the conditional entropy estimator depends strongly on the number of alternative values, our choice of having roughly the same number of values for both subjective and objective data discourages the estimator from having dramatically different variances for the two cases.
A comparison of conditional entropy of WA and WO obtained by conditioning using subjective (red) versus objective (blue) location is reported in Fig. 8. The two left bars in each plot refer to conditioning the target modality using location only, while the two right bars indicate conditioning on all other modalities. There is a very clear difference between self-reported locations (WE) and GPS data: while knowing the GPS coordinates and traveled distance of the study participant reduces entropy in all cases, the reduction is far more modest than that obtained by conditioning on subjective location. The impact on predictability is analogous: GPS information provide a substantial boost to predictability (cf. Table 2), from 45% to 70% for WA and from 67% to 81% for WO. This is compatible with the results obtained above for inter-modal correlations. The improvement is however always inferior to the one induced by subjective location: for WA, predictability is 70% when supplying objective location but goes up to 92% when supplying subjective annotations. For WO, the difference is less pronounced: 81% (objective location) against 90% (subjective). This is, again, likely due to the strong connection between activity and location. The situation is roughly unchanged if we condition the target modality on the rest of the context, namely location (either subjective or objective), time, and the remaining modality. These results show that subjectivity, besides being necessary for framing behavior from the subject's perspective, has a substantial effect on predictability and regularity of behavior in practice.

Diversity: motivation
In the last experiment we studied the diversity of personal behavior. The motivation underlying this experiment is to provide some evidence of the intrinsic diversity, both objective and subjective, of the personal context of an individual. It is a widespread intuition that most of the time people behave similarly to each other. Indeed, everybody sleeps, eats, works, and socializes, and these activities take up most of our time. So, at a high level, everybody behaves the same during these high-frequency (subjective) activities. Our intuition is that individual differences manifest themselves in infrequent behaviors-for instance, while most people only go to the cinema in the evening, a cinephile has no issue going to a matinée.
A prerequisite to this argument is that rare behaviors occur often enough to be statistically meaningful. To determine whether this is the case, we checked whether the empirical distribution of context annotations is heavy-tailed. This was achieved by fitting three candidate distributions, a power law distribution, a log-normal distribution, and an exponential distribution to the data. 5 It is apparent from the plot shown in Fig. 9 that the log-normal distribution (with μ = -8.2, σ = 1.6) offers a much better fit of the behavior of individuals than the exponential model, which is not heavy-tailed. This supports the idea that individual behavior described using the four identified contextual modalities is heavy-tailed, as expected.
Inspired by some studies on the uniqueness of mobility [60,61] and apps usage [62] behaviors, we investigate whether annotations in the tail of the context distribution are indicative of personal identity, that is, whether it is easier to identify individuals using Figure 9 Comparison of power-law distribution and exponential distribution fit on the empirical distribution annotations from the tail or from the "head" of the distribution. For instance, in our university setting we expect common (head) annotations like morning, classroom, lesson, classmates to convey very little information about individual identity, as most university students attend lectures in the morning, and rarer (tail) annotations like morning, workplace, work, alone to be far more informative.

Diversity: experiment and results
We designed a classification task in which the goal was to predict the identity of individuals based on context annotations only. All records in our data set were annotated with the ID of the subject they were generated by. The head and tail of the distribution were then defined using an arbitrary threshold τ ≥ 0: annotations that appear with frequency below τ were taken to fall in the tail and the others in the head. Next, we trained two Support Vector Machine (SVM) classifiers [63] separately on the tail data and on the head data, and compared their performance. Both models received annotations for all modalities as inputs. As above, performance was measured in terms of F 1 score (the higher the better) in a 10-fold cross validation setup. Notice that the number of personal IDs is 156, which is fairly large and renders the classification task highly non-trivial. For reference, the expected F 1 score of a random classifier is 1/156 (indicated in cyan in the plots).
The results can be viewed in Fig. 10. The top plot shows the F 1 score of the two classifiers as the threshold τ is increased. Recall that a lower threshold entails that fewer annotations fall in the tail and more in the head. The threshold ranges from 0 (left of the plot), in which case no annotation falls in the tail, to the smallest value for which all data fall in the tail, which is ≈ 0.57 (right of the plot). Broadly speaking, the tail classifier always outperforms the head classifier by a large margin, while the head classifier never performs better than a classifier trained on both head and tail annotations (the green line in the figure). In order to better analyze the plot, we split it into three regions, highlighted by the purple lines (notice that the sticks on the x-axis are non-uniform.) In the leftmost region, the tail classifier does outperform the head classifier as soon as there are enough annotations in the tail, and it stabilizes at around 40% F 1 score for τ from about 0.00005 to 0.00012. Here the tail is maximally informative, presumably because it only contains rare and informative context annotations. As the threshold increases and less "rare" annotations fall in the tail (middle region), the tail classifier drops off in performance but it still outperforms the "all" and the head classifiers. The head classifier also performs worse and worse, as more annotations move from the head to the tail. In the rightmost region, the tail converges to the full data set and hence the tail classifier converges to the performance of the "all" classifier.
A break-down of performance for different study participants is reported in Fig. 10 (bottom) for the two thresholds corresponding to the minimum (τ = 0.00001) and maximum (τ = 0.00007) of F 1 respectively. Individuals are sorted on the x-axis according to the F 1 score of the tail classifier, for readability. In the left figure, when τ = 0.00001, the size of tail data is extremely small and only less than 20 users have annotations. This explains clearly why the performance of the tail classifier drops when the threshold is too small. On the other hand, for τ = 0.00007 (right figure) the overwhelming majority of individuals is more likely to be identified correctly by looking at their infrequent behaviors-with less than 10 exceptions. This provides evidence in support of the fact that the tail of the distribution conveys much more information about personal identity than the head. The "exceptional" participants themselves can also be easily explained. These individuals are hard to classify because their behavior is slightly more regular than that of the other volunteers, meaning that their most of their annotations occur more frequently and therefore are more likely to fall in the head of the distribution. Indeed, we verified that this issue disappears once the threshold is increased slightly (data not shown). A proper solution for this issue would be to choose the threshold τ on a subject-by-subject basis. This is however orthogonal to our goals, and beyond the scope of this paper.

Conclusion
In this work, we have studied the predictability of human behavior through the notion of personal context. Our study captures a rich, multi-faceted picture of individual behavior by looking at four orthogonal but interrelated dimensions-namely time, location, activity, and social ties-viewed from the subject's own perspective. An empirical analysis on a large data set of daily behaviors shows the benefit of this choice: the different contextual modalities and their subjective description are shown to provide important cues about the predictability of individual behavior. Motivated by this, we also applied our contextual modalities to study behavioral diversity. The obtained results highlight that individuals are more easily identified from rarer, rather than more frequent, subjective context annotations.
This work can be extended in several directions. First and foremost, while our results are promising, we plan to further validate them in more settings and in specific applications. To this end, we are currently working on collecting a much larger data set, with students from several universities in four different countries, which will serve as a basis for a thorough investigation of the results presented here.
This work also highlights an interesting conundrum. Our results suggest that subjective annotations are very useful for predicting certain contextual modalities. However, these subjective annotations, obtained by filling questionnaires, have some degree of error related to, for example, the list of alternatives that are allowed to the respondent, e.g. the list of activities, places, or people; the memory effect of the respondent when s/he does not respond immediately; the social desirability effect that may prevent the study participant from reporting certain (socially disapproved) activities; and unreported activities when the participant perceives this as an intrusion into her/his privacy. Moreover, in practical applications, collecting self-reported annotations is not always an option. This means that in some settings and scenarios one has to compute predictions from sensor measurements only, which is likely to incur a substantial performance penalty. Going forward, one option is to replace the ground-truth self-reported annotations with predictions. This makes especially sense in a multi-task prediction pipeline in which all contextual modalities are predicted jointly from sensor measurements. This way, the predictor can leverage intermodal correlations, which are key for inferring some locations and activities and for avoiding inconsistencies. This prediction pipeline would be fully operationalizable even in the absence of subjective annotations, so long as an initial training set is available. The downside is that replacing annotations with predictions does introduce noise into the system. Finding a complete solution to this problem is an interesting avenue for future work.