Skip to main content

Predicting subjective well-being in a high-risk sample of Russian mental health app users

Abstract

Despite recent achievements in predicting personality traits and some other human psychological features with digital traces, prediction of subjective well-being (SWB) appears to be a relatively new task with few solutions. COVID-19 pandemic has added both a stronger need for rapid SWB screening and new opportunities for it, with online mental health applications gaining popularity and accumulating large and diverse user data. Nevertheless, the few existing works so far have aimed at predicting SWB, and have done so only in terms of Diener’s Satisfaction with Life Scale. None of them analyzes the scale developed by the World Health Organization, known as WHO-5 – a widely accepted tool for screening mental well-being and, specifically, for depression risk detection. Moreover, existing research is limited to English-speaking populations, and tend to use text, network and app usage types of data separately. In the current work, we cover these gaps by predicting both mentioned SWB scales on a sample of Russian mental health app users who represent a population with high risk of mental health problems. In doing so, we employ a unique combination of phone application usage data with private messaging and networking digital traces from VKontakte, the most popular social media platform in Russia. As a result, we predict Diener’s SWB scale with the state-of-the-art quality, introduce the first predictive models for WHO-5, with similar quality, and reach high accuracy in the prediction of clinically meaningful classes of the latter scale. Moreover, our feature analysis sheds light on the interrelated nature of the two studied scales: they are both characterized by negative sentiment expressed in text messages and by phone application usage in the morning hours, confirming some previous findings on subjective well-being manifestations. At the same time, SWB measured by Diener’s scale is reflected mostly in lexical features referring to social and affective interactions, while mental well-being is characterized by objective features that reflect physiological functioning, circadian rhythms and somatic conditions, thus saliently demonstrating the underlying theoretical differences between the two scales.

1 Introduction

In recent years, evaluation, analysis and improvement of subjective well-being (SWB) has gained a growing attention of both researchers and practitioners [1, 2]. Attention to SWB has naturally been coupled with the increasing research interest in depression – the leading cause of disability and subjective well-being loss worldwide [3, 4]. The COVID-19 pandemic, resulting in the shift to hybrid work and the decline in face-to-face communication has put many individuals at additional mental health risks [5, 6]. Some of the most widely available instruments to mitigate such risks are online and mobile services that offer quick screening tests of subjective well-being and mental health states and automatically generate respective recommendations. More than 240 mental health apps are available in the App Store today, some of which are extensively using machine learning for classifying and scoring their users in terms of their psychological or mental conditions [79]. Such apps attract consumers concerned with their psychological states, while these concerns are usually associated with higher risks for users’ SWB or mental health. As these individuals agree to donate parts of their digital traces, psychological apps become natural hubs accumulating data on individuals at risk. Such data, if available, provide ample opportunities for the development of open source algorithms for early automatic detection of threats to well-being in high-risk populations with their digital traces.

Subjective well-being is most commonly defined in accordance with Diener’s approach [10] as a person’s satisfaction with their life (which constitutes SWB’s cognitive component) and the prevalence of positive emotions over negative ones (affective balance, which constitutes SWB’s affective component). To date, about 100 assessment tools measuring about 200 facets of well-being have been proposed, thus complicating the selection of relevant metrics [1]. The two most widely used SWB measurement tools are Diener’s Satisfaction with Life Scale (SWLS) [10] and the scale introduced by the World Health Organization in 1998, known as the WHO-5 index [11]. The former aims to capture generalized long-term subjective well-being, while the original goal of the latter was to screen and rate depression. Later, Bech, one of the WHO-5 developers, also showed that this scale is equally good at detecting high degrees of psychological well-being, which he proposed to consider a component of mental health, along with the absence of depression symptoms [12].

Both SWLS and WHO-5 are short unidimensional 5-item scales with proven validity and reliability (α coefficients 0.79–0.89 for the former and 0.82–0.95 for the latter) [1315]. Both have become common for well-being screening in a wide range of populations and among different nationalities [1518]. The wide use and the proven quality of these metrics defines their choice for our research in automatic SWB prediction; however, some more details on their distinctive features should be added.

SWLS, apart from being centered on pleasure and satisfaction, is also meant to be time- and dimension-independent. The first feature means that it is not tied to a specific time interval and measures satisfaction with our past, present and future. The second feature refers to the generalized character of such satisfaction, not being tied to any particular dimension of human life, such as health, relationships or finance. The choice of the dimensions to be taken into account and the weight assigned to them is left with the subject and is expected to be based on a blend of objective reality and the subject’s subjective experience of it. It is assumed that a person is able to adequately assess her well-being and has all the necessary and unbiased information for that [10].

SWLS is widely used by psychologists, public health professionals, and economists. According to the World Happiness Report, SWLS provides a more informative measure for international comparisons of well-being than some measures capturing affective component only [19]. Importantly, SWLS is stable under unchanging conditions, but is sensitive to changes in life circumstances: thus,its growth is associated with higher likelihood of marriage and childbirth and with lower likelihood of job loss and relocating [20]. It is also predictive of physical and physiological outcomes, as judged from a 4-year follow-up period in the same study. It is these meaningful changes that have been found responsible for the drop of SWLS test-retest reliability from 0.84 in the window of a few weeks to 0.54 in the 4-year window [21]. These changes are clearly distinct from the short-term random mood fluctuations responsible for explaining 16% of variance in the short run. It thus can said that SWLS captures a stable and a transient components both of which are present in human well-being.

In contrast to SWLS, WHO-5 index aims at a brief assessment of emotional well-being over a 14-day period (thus containing no cognitive component and being highly time-sensitive). Its items represent positive affect whose absence corresponds to the depression symptoms (negative affect). This is an important advantage of WHO-5 as the subjects are not forced to confess of the presence of any unpleasant and potentially hard-to-admit negative emotions or states. As mentioned above, WHO-5 has been proven effective for the detection of both depression risk [22, 23] and the high levels of well-being[12]. Being a short, sensitive, specific and non-invasive tool, it gains over more detailed, but heavier methods for preliminary depression and suicide risk assessment in settings without psychological/psychiatric expertise. WHO-5 has shown high clinimetric validity and the ability to accurately predict a wide range of mental health conditions, including depression; moreover, it has been recommended as an outcome measure balancing the wanted and unwanted effects of treatments [24]. That is why WHO-5 has been adopted in many research fields such as suicidology, geriatrics, youth and alcohol abuse studies, personality disorder research, and occupational psychology [15, 24].

Thus, WHO-5 and SWLS, being psychometrically sound screening tools with known outcomes, also measure complementary aspects of subjective well-being. Although measures of emotional affect and reported life satisfaction often correlate, substantial divergences have been found. For instance, almost half of the people who rated themselves as ‘completely satisfied’ also reported significant symptoms of anxiety and distress [17]. Therefore, quality of life in the current coronavirus crisis is usually measured with both scales [5, 6, 2527]: while WHO-5 helps to assess influence of different practices on SWB and the persistence of diminished well-being beyond and during COVID-19, SWLS shows how people feel and how their life perspective changes due to the pandemic. This complementarity indicates the importance of comparative research in prediction of both metrics.

This task is novel for SWB prediction with digital traces: despite the advances in detection of specific mental health problems and the attempts to predict some SWB metrics, no research so far has been dedicated to predicting WHO-5 and its comparison with SWLS in terms of digital behavior traces; moreover, most research is limited to English-speaking populations. Best models predicting SWLS with digital traces from social media, search engine and smartphone activity data demonstrate performance below 0.4 in terms of Pearson correlation – a well-known threshold for correlation between psychological characteristics and objective behavior [28, 29] (see also [30, 31] for an overview). None of the models combines language, social media and smartphone usage data.

The goal of this study is to predict individual WHO-5 and SWLS levels with a new combination of digital traces in a high-risk Russian-speaking population, to find out which features are the most predictive and what the overall predictive power of our models is. A high-risk population is defined as a population with a higher probability of having problematic levels of SWB, as compared to more general populations. We thus address a completely novel task of comparative prediction of two different aspects of subjective well-being, which should have different objective indicators and suggest different actions to be taken by the user. Additionally, we find out that depression risk in Russian-speaking population can be detected by the level of WHO-5 below a certain threshold as successfully as in the populations for which WHO-5 was tested earlier, and this allows us to predict the threshold as well. To do so, we make use of a sample of 372 psychological application users who have explicitly consented to share their private messages, social media data and mobile device usage traces. We use extensive feature engineering combined with regression and classification modeling, the first type of models being aimed at SWB score prediction, and the second – and depression risk identification based on theoretically justified thresholds. We also check our regression models against newest neural network approaches that, however, do not show sufficient quality at the dataset of our size.

The rest of the paper is structured as follows. In the next section we review the existing literature in prediction of SWB and related psychological and mental health phenomena with digital traces. Next, we describe our dataset, our numerous features and the approach to their engineering, as well as the models used. In the Results section we report our best models’ performance and the most useful features. In the Discussion section we interpret our results and indicate the most important limitations. We conclude with the perspectives for future research.

1.1 Subjective well-being prediction

Prediction of internal psychological and mental states from objective behavior pattern is a highly difficult task [29, 32]. Additionally, clinically diagnosed mental disorders (such as depression) and mental disorder risks assessed through threshold scores of screening tests (such as WHO-5) are different categories for prediction. While the former may be partially manifest, the latter, along with psychological traits and conditions, are latent constructs. This means that psychological theory does not expect them to fully correlate with any observable patterns since the former are not thought of as reducible to the latter in principle. This may be one of the reasons why such correlation is seldom high, although this is a subject for further research. As both high SWB and the absence of mental disorder symptoms have been shown to be components of mental health [12, 33], prediction of both SWB and mental disorder (or its risk) constitutes two related tasks. However, due to the different nature of SWB and mental disorder as concepts, the former is usually evaluated with continuous predictive models, while the detection of the latter is most often formulated as a classification task.

1.1.1 Detection of mental disorders

A vast amount of studies predict specific mental health conditions with digital traces, mostly with the data from social media, such as Facebook and Twitter. The most widely analyzed conditions of such studies are depression and Post Traumatic Stress Disorder [3438]. Other conditions include Bipolar Disorder, Anxiety and Social Anxiety Disorder, eating disorders, self-harm and suicide attempt [3942]. Linguistic features used typically include word n-grams, sentiment, specific lexica (e.g., Linguistic Inquiry & Word Count dictionary, LIWC) and topic modelling, with other features related to social networks, emotions, cognitive styles, user activity and demographics [3439, 42]. Model evaluation metrics include Area Under the Curve (AUC), Precision, Accuracy of classification, and Correlation for continuous measurements. The results for binary mental health problem identification are high, reaching an AUC of 0.7–0.89, Precision up to 0.85, and Accuracy of 0.69–0.72 [30].

Ground truth information in such studies is obtained from different sources, leading to different quality. Most studies use either self-reported survey data [34, 37] or self-declared mental illness [36, 39]. The latter is prone to errors and bias induced by specific data collection methods.

In a recent study Eichstaedt et al. [38] effectively predict depression of Facebook users against medical records information. The authors use a 6-month history of Facebook statuses posted by 683 hospital patients, of whom 114 were diagnosed with depression (rate similar to the general population), and classify depression VS other medical diagnoses with an AUC = 0.72. Features of Facebook statuses include words and word bigrams, temporal characteristics of posting activity, metainformation on post length and frequency, topics and dictionary categories, with interpersonal, emotional and cognitive categories being among the best predictors.

The effects of smartphone usage on mental disorders, until very recently, have been mostly studied with self-reported data (see [43, 44] for an overview). Meanwhile, smartphone apps that collect usage data provide an unprecedented opportunity to access objective and precise information on smartphone application usage. Hung et al. [45] find that phone call duration and rhythm patterns are predictive of negative emotions, while Saeb et al. [46] predict depressive symptom severity with geographical location and phone usage frequency information. However, as feature engineering with phone app usage data requires considerable time and effort [47], the potential of such data of psychological research is yet to be discovered.

1.1.2 Prediction of SWB levels

There have been a few studies aimed at predicting subjective well-being levels, mostly with regression, which obtain modest results. Individual and relational well-being was predicted from social network data [28, 48] and from objective smartphone use data [49]. The reported results are close to the upper bound expected in this task: the meta-analytic correlation between digital traces and psychological well-being has been estimated as \(r = 0.37\) across nine studies, including prediction of subjective well-being, emotional distress and depression [28]. The only study that reaches a higher correlation of 0.66 in one of the models [49] does not specify the scales used for measuring SWB; however, interestingly, it finds that while some apps predictably have a negative effect on well-being, others affect it positively.

Diener’s SWLS, to our knowledge, has been predicted in only four studies that use digital traces in a cross-validated setting. In his pioneering study, Kosinski et al. [50] predicted SWLS with linear regression for 2340 Facebook users based on 58K ‘Likes’ – preferences of webpages indicated by the users. The Likes data dimensionality was reduced to top 100 components in a SVD model based on a larger dataset (58K users). The obtained correlation reached \(r = 0.17\), whereas empirical test-retest correlation for SWLS was \(r = 0.44\).

Collins et al. [51] predicted SWLS with Random Forest Regression and various Facebook features, including demographics, networking data, photos, likes, ground truth Big Five traits of the users, of their significant others and friends, and predicted Big Five as a proxy. The best result for a sample of 1360 users with Big Five features as a proxy reached the Mean Absolute Error (MAE) = 0.162, whereas the model with social network features produced MAE = 0.173 for SWLS. Unfortunately, no other evaluation metrics were reported in this study. Schwartz et al. [52] applied Ridge Regression to predict SWLS of 2198 individuals using their Facebook statuses. Thousands of linguistic features were extracted from the status texts, including 2000 topics obtained with the Latent Dirichlet Allocation topic modeling algorithm, word uni- and bi-grams, LIWC and sentiment lexica. A message-user level cascaded aggregation model was additionally trained on a disjoint dataset, which allowed to improve regression results from Pearson \(r = 0.301\) to \(r = 0.333\). Facebook status data were also used by Chen et al. [53] to predict SWLS of 2612 users. Features included affect measured by sentiment word usage, 2K topics obtained with topic modeling and 66 LIWC categories. After feature selection with Elastic Net regression, Random Forest model was tested for prediction of an unseen subset. The results reach Root-Mean-Square Error RMSE = 1.30 (0.217 when rescaled to \([0;1]\)) and \(r = 0.36\).

There is a certain number of studies predicting SWB with app usage data. Some of them rely on self-reported measures of app use [54], while others collect objective data [49, 55]. Correlation in David’s model range from 0.31 to 0.66, however, the research does not specify the scales used for measuring SWB. At the same time, interestingly, it finds that while some apps predictably have a negative effect on well-being, others affect it positively. Gao and colleagues [55] report correlation from 0.34 for male users to 0.66 for female users in the task of predicting SWLS, however, they do not report the full feature set and the contribution of each feature in their best models. Instead, they mention that the most predictive variables are communication apps, certain types of games and the frequency of photo taking. None of these studies mentions cross-validation.

Overall, although the results of subjective well-being prediction are promising, several gaps in the existing research can be identified. First, WHO-5, which is an effective screening tool for depression risk and subjective well-being, has never been studied in a predictive research design. Second, all the studies predicting SWLS are limited to English-speaking populations and respective linguistic features. Moreover, these works only address Facebook digital traces, including profile, texts and likes. Finally, only scarce feature interpretation is reported in the previous studies, and digital trace manifestations of different well-being dimensions have never been compared.

1.2 Our approach

In this study, we set out to predict two different concepts of subjective well-being: one combining affective balance and life satisfaction (measured by SWLS index and further referred to as satisfaction-related SWB) and the other conceptualized as a reflection of mental health (measured by WHO-5 index and further referred to as mental SWB). For predicting well-being values, our task is defined as regression, while for detecting depression risk, we formulate our goal as a binary and trinary classification task. For the latter, we identify the threshold values of WHO-5 by validating them against the scores of the same users on the scales of depression, anxiety and stress, so that the WHO-5 values predicting these scores with the highest sensitivity and specificity are chosen. We perform our prediction of SWB on the texts of private messages, social media and smartphone usage information and perform regression and classification experiments in a cross-validated Machine Learning design. The novelty of the current study lies in the following:

  1. 1.

    We present the first study so far on predicting subjective well-being measured by WHO-5;

  2. 2.

    We find out a close association of WHO-5 thresholds with three scales of mental health which is promising in terms of extending our approach to the task of simultaneous prediction of a range of various mental health risks.

  3. 3.

    We are the first to compare satisfaction-based and mental SWB, analyzing their intersections and differences in terms of predictive features;

  4. 4.

    This is the first study to combine language, social media and phone app usage features in well-being research;

  5. 5.

    To our knowledge, our study is the first to address subjective well-being prediction in a Russian-speaking population and respective data: the Russian social network VKontakte and texts in the Russian language;

  6. 6.

    We use a dataset of a psychological application users, allowing us to predict subjective well-being in real-world conditions for a sample with high mental risks, which has never been done before;

2 Materials and methods

2.1 Dataset

Our dataset was collected in collaboration with Humanteq social analytics company, using its DigitalFreud app (DF) – a Russian-language phone application for psychological self-assessment – promoted among Android-based smartphone users through Google Ads. Android was chosen as the basic operational system for data collection, as at the time of the app development and promotion its users constituted the majority (68–76%) [56] of Russian smartphone users who in turn were the app’s target audience and who constituted 57–64% [57] of Russia’s population. Additionally, the app was available to Russian speakers from any country, and although users from the countries other than Russia constituted the minority, none of the samples we further analyze is intended to be representative of Russia.

Data collection via a psychological app of such type was used to access a high-risk population (its high-risk status was confirmed in subsequent comparison of its mean SWB to those in other populations, presented further below). Users were offered to take as many free tests as they wanted (including personality traits, depression, anxiety, stress, cognitive, motivation and SWB tests) and to explicitly consent to the access to their VKontakte profile data and/or smartphone use data. Based on the test results, users were offered psychological feedback and analytics on the use of VKontakte and/or their smartphones. On average, DigitalFreud users chose to fill in 1.5 questionnaires and shared varying subsets of their data, which made the overall dataset quite sparse.

Privacy policy included a clause stating that the data could be used for research. The study was approved by the HSE Ethics Committee; nevertheless, the data were anonymized prior to the analysis. No personal information (i.e. allowing to identify the users) was included in the sample. In particular, all the user profile ids were encrypted.

The initial sample included 2050 accounts of DigitalFreud users who have completed at least one of the two questionnaires of our interest: SWLS [10] or WHO-5 [58]. The vast majority completed either of the tests only once; for those who did it more than once, the earliest score was taken into our dataset.

The following digital traces data were available for the participants:

  • DigitalFreud profile data;

  • VKontakte user data;

  • Phone application data.

Due to data sparsity, our final sample used in prediction contains digital traces by 372 users. The procedure of data cleaning that produced this dataset is given in Appendix 1. Thus the dataset is small because the data on well-being combined with personal digital traces is highly difficult to obtain, as it requires both considerable effort from a user on completing the questionnaires, and trust allowing them to share sensitive digital traces. However, our dataset is uniquely tailored to the task of predicting SWB in a high-risk population of mental health app users.

Additionally, there is a heldout dataset, which consists of messages written by 572 users, who lack other important features for prediction (demographics, phone app usage) but have text data. The heldout dataset is used for preliminary feature selection (see sections Words, Word clusters below). Before feature selection, texts were tokenized with happiestfuntokenizingFootnote 1 and lemmatized it with pymorphy [59].

The phone app dataset consists of phone application usage data by 992 users who lack other important features for prediction. The phone app dataset was used for preliminary phone application categorization and feature engineering.

We also collected a sub-sample of users (\(N = 417\)), who have completed the WHO-5 and at least one of the following questionnaires evaluating different mental health risks (mental health dataset):

  1. 1.

    Depression measured with the Patient Health Questionnaire (PHQ-9) [60];

  2. 2.

    Anxiety measured with the General Anxiety Disorder scale (GAD) [61];

  3. 3.

    Stress measured with the Perceived Stress Scale (PSS) [62, 63].

The mental health dataset was used in the WHO-5 classification task to select cutoff thresholds of the classes to be predicted, so the former would be representative of a range of mental health conditions.

2.1.1 Self-reported well-being measures

Satisfaction-related well-being scale (SWLS)

The SWLS questionnaire was translated to Russian and validated by Ledovaya et al. [64].

The questionnaire contains 5 statements, each characterized by 7-point Likert scale ranging from 1 (strongly agree) to 7 (strongly disagree). The resulting SWLS score ranges from 5 (low satisfaction) to 35 (high satisfaction). The scale has good internal consistency: α coefficients ranging from 0.79 to 0.89. Test-retest coefficient, as already mentioned, ranges from 0.54 to 0.84 depending on the time lag between measurements (years or weeks, respectively) [21] and amounts to 0.78 in the Russian language version[64]. In our sample, 1727 accounts have information about the SWLS score.

Mental well-being scale (WHO-5)

We use the official Russian-language version of WHO-5 scale developed by WHO itself [58]. Each of WHO-5 items is scored on a 6-point Likert scale ranging from 0 (at no time) to 5 (all of the time). The WHO-5 score ranges from 5 (absence of well-being) to 30 (maximal well-being).The scale has good Internal consistency: α coefficients ranging from 0.82 to 0.95 [13]. Test-retest coefficients are available for specific populations only and only in the short run ranging from 0.81 to 0.83 [65, 66]. In our sample, 1791 accounts have information about the WHO-5 score.

Mental well-being classes

As mentioned earlier, WHO-5, unlike SWLS, is indicative of a range of mental health conditions [24] and was directly designed to detect one of them [11]. Decisions of mental health, be it screening test results or medical diagnoses, are usually binary and point either at the absence or the presence of a disease. For such tasks scales need to be transformed into sets of discrete classes based on a certain threshold values. Such validated values exist for the original English-language WHO-5 scale (0.28 for major depression and 0.5 for depression). They are recommended for all nations and languages, but in fact have never been tested for the Russian-language population. Meanwhile, it has been shown that cultural differences matter in scale construction [67] and that, specifically, they complicate both mean WHO-5 comparison and threshold comparison across countries [15]. Therefore, we validated several thresholds ourselves. For this, we analyzed the mental health dataset of 417 DigitalFreud users who have completed both WHO-5 and one of the three questionnaires – on depression, anxiety and stress – and found the values of WHO-5 index best predictive of the classes of these three scales. This approach was our choice for two reasons:

  • the data on clinically diagnosed depression are absent from our dataset;

  • the three mentioned scales were validated for the Russian language and thus have been used here as the best available benchmarks.

We tried out different WHO-5 thresholds to reach better sensitivity and specificity in representing the following conditions: PHQ/GAD ≥ 10 for depression and anxiety [68], and PSS ≥ 21 for stress [63]. Additionally, as from our earlier work [69] we know that classes derived from scale reduction might be better predicted in a trinary design in social science NLP tasks, we also experimented with three-class divisions.

Eventually, our analysis resulted in the following cutoff values of the normalized WHO-5 scale:

  • Binary cutoff = 0.51 with classes containing 221 and 151 users in the low and high SWB classes, respectively;

  • Trinary cutoffs \(= [0.35; 0.59]\) with classes containing 111, 158 and 103 users in the low, medium and high SWB classes.

Table 1 illustrates sample statistics for each of the mental health conditions, and specificity and sensitivity in terms of the selected WHO-5 cutoff values.

Table 1 Specificity and sensitivity of the selected WHO-5 cutoff values in the mental health dataset

In our high-risk sample of mental health app users, the binary WHO-5 cutoff value 0.51 allows to reach high sensitivity across the analyzed mental health conditions, while preserving moderate specificity. The trinary cutoff values 0.35 and 0.59 allow to obtain low and high mental well-being classes with very high specificity.

2.1.2 Digital traces

DigitalFreud profile

Account information about the DigitalFreud user includes encrypted DigitalFreud and VKontakte user ids, SWLS and WHO-5 scores, gender, birth year, education, employment and marital status, and date and time of the DigitalFreud app installation.

VKontakte user information

Humanteq chooses to match DigitalFreud data with VKontakte data since the latter is the most popular social networking site in Russia. We use the following data obtained with VKontakte application programming interface (API):

  1. 1.

    User Profile data. Although VKontakte API provides access to potentially rich user information, in practice users seldom fill in their profiles, and the data is sparse. As a result, we only use gender, birthdate, and the number of friends and subscriptions in our analysis.

  2. 2.

    Wall posts (text, date and time, information on reposting with the original post contents and encrypted user id, number of reposts, comments and likes) available for 1871 users.

  3. 3.

    Directed private messages (text, date and time, encrypted author and addressee ids) available for 1044 users.

Phone application usage

Phone application usage was monitored for one week following the initial consent obtained from the user when she started using DigitalFreud, which was consistent both with the app’s terms of use and the policies of the Android platform. The collected information includes name and package of the application, start time and duration of the application usage in foreground in milliseconds. It is available for 992 users. In a few cases when the users quit the phone app data sharing before the end of the week, the recorded period was shorter.

2.2 Descriptive statistics

The main parameters of the descriptive statistics for our final dataset of 372 users are given in Tables 2 and 3. Our dataset is predictably skewed towards containing more females (80%) and young people (mean age 23 ± 5 y.o.) against 53% of females and the mean age of 39 y.o. in the general Russian population [70]. However, as it has been mentioned, this sample is not theoretically intended to represent Russia. Consistent with Collins et al [51], we normalize both well-being scores to the ranges between \([0,1]\); to do so, we subtract 5 from both scores, then multiply SWLS values by 1/30, and WHO-5 values by 1/25. Additionally, the distribution of the SWB and demographic data in the final dataset is illustrated in Appendix 2, Figs. 14.

Figure 1
figure 1

Distribution of SWSL values

Table 2 Descriptive statistics for subjective well-being, age and gender in the final dataset
Table 3 Descriptive statistics for the textual and phone app usage features in the final dataset

SWLS and WHO-5 intercorrelate strongly with \(r = 0.568\), \(p < 10-32\). The level of internal consistency of both scales is high (Cronbach’s \(\alpha > 0.82\)).

Both SWB scores in our final sample are consistently lower than in other studies made on other groups of Russians. Thus, WHO-5 score amounts to the average of 0.46 ± 0.187 in our dataset against 0.60 ± 0.191 obtained in a study of Russian Facebook users [71], the only available evaluation of WHO-5 for Russia. Likewise, while the mean SWLS score among our participants is 18.3, a study on a sample close to the general Russian population (mean age 41 y.o. with 54% of women) shows the score of 23.6 [72]. A younger group of Russian students (mean age 20 with 65% of women) which is more similar to our sample scores even higher: 24.4 [73]. The lower SWB levels in our dataset are explained by self-selection of specific individuals to the DigitalFreud app: it naturally attracts users interested in seeking psychological and mental health information and advice, i.e., potentially more likely to have problematic mental health conditions. This is in line with our research goal of studying high-risk populations, of which our sample is an obvious example exactly due to the lower SWB scores.

2.3 Feature engineering

For our task of SWLS and WHO-5 prediction, we construct features of three main types:

  • User metadata and overall activity: demographics, DigitalFreud & VKontakte profile statistics, and overall phone app usage statistics;

  • Textual, or linguistic features:

    • Words;

    • Sentiment scores;

    • RuLIWC;

    • Word clusters;

  • Phone app usage statistics by app category.

Overall, we constructed 660 features for SWLS and 651 for WHO-5. Most features were calculated as counts, ratios or counts by time period directly from the final dataset. However, words and word clusters as features were trained on the heldout dataset that does not intersect with the final dataset. Of these features, only those that correlated with the target variables were selected for the main experiments. In the main experiments, the features were submitted to the regression or classification models, which performed on the final dataset that was divided into train, development and test subsets in a 10-fold cross-validation scenario. In this scenario, (1) multiple models were trained on the train set, (2) recursive feature elimination was performed on the development set based on MAE of the models, and (3) final scores for each feature type and each model were computed based on the test set. More details on the main experiment procedure are given in the Machine Learning Experiments section.

2.3.1 User metadata and overall activity features

There are 40 features describing demographics, overall phone application usage data and the data on the overall activity patterns based on DigitalFreud and VKontakte profiles (see Table 4). The activity-related data include three groups of features: (1) numbers and volumes of personal messages written during one month preceding test completion, (2) numbers of alters, or accounts that a user has a message history with, for every user in each of the 12 months preceding test completion, and (3) weighted differences between the last two months in terms of the message volume and the number of alters. In building phone app usage features, we follow the previous research [74, 75] which identified three- and six-hour periods of online activity to be significant markers of mental illness. In our research, we break phone app usage into three-hour periods of activity. Some features have been excluded from the analysis, due to data saprsity.

Table 4 User metadata and overall activity features

2.3.2 Linguistic features

Our extensive analysis of user texts has shown that VKontakte public wall posts are too sparse and include mostly web link content, which does not allow for effective prediction. As a result, we construct all the linguistic features based on private messages written by the users in VKontakte messenger, mostly during one year preceding the installation of DigitalFreud app.

Sentiment scores

We use six features representing the proportions of positive and of negative words in the messages created during one month or one year preceding test participation, or in the entire messaging history of a user. Each feature represents the proportion, or l1-normalized frequency, of positive or negative sentiment words written in one of the three time periods (which results in \(2\times 3=6\) features). The sentiment words were identified with a closed-vocabulary approach based on the Russian sentiment lexicon RuSentiLex [76].

Words

We adopt the open-vocabulary approach to word features predictive of well-being [77]. Given the small size of our final dataset (372 observations), using all the frequent words as features (12K words with frequency ≥ 200) would inevitably result in overfitting. To overcome this and to select a reasonable number of interpretable features, we use the heldout dataset as follows:

  • First, a sub-sample of users who have filled both well-being questionnaires was selected from the heldout dataset (396 users);

  • Next, we selected 12.5K words occurring more than 200 times in the joint one-year long message collection of all users and calculated their TfIDF scores using 396 individual message collections as 396 texts for such calculation;

  • We filtered out words with \(p > 0.01\) in the ANOVA tests relating these words to SWLS and WHO-5 values in the heldout dataset, which has resulted in the selection of 165 words for SWLS and 224 words for WHO-5 (see Appendix 3 for the full list). Words belonging to either of these sets (353 words) are used as features for prediction.

RuLIWC

For obtaining closed-vocabulary features, we used RuLIWC dictionary – a translation of the most prominent categories of the Linguistic Inquiry and Word Count (LIWC, [78]) performed by Panicheva & Litvinova [79]. RuLIWC consists of eight word categories: Bio, Cognitive, Social, Time, Percept and subcategories of the latter: Feel, Hear, See, with 563–2624 words in each category and 20–303 words in each subcategory. For this research, RuLIWC feature values have been computed as the sums of all the words’ TfIDF values for every user. All the words regardless of their (in)frequency were accounted for.

Word clusters

Content features were computed by clustering words with a word2vec semantic model [80] based on the heldout dataset. The word2vec model we used had been trained on the web-based Taiga corpus containing over 5 billion words [81] by Kutuzov & Kuzmenko [82], with skipgram algorithm, vector dimensionality = 300, and window size = 2. For clustering, we used 7128 words present in the model vocabulary with frequency ≥ 200 in the heldout dataset. Next, we performed KMeans clustering with cosine distance and 300 clusters. As KMeans algorithm is stochastic and may give very different results in different runs, we used the following procedure to obtain reproducible cluster solutions:

  • We employed cluster regularization, where the regularization parameter was the sum of p-values of the cluster occurrence correlation with SWLS or WHO-5;Footnote 2 the regularization weights were \([0; 50; 100; 500]\);

  • For every weight value, ten random cluster solutions were obtained;

  • Based on these solutions, consensus cluster solutions were constructedFootnote 3 with the following thresholds: \([0.25, 0.45, 0.65, 0.75, 0.85]\);

  • This resulted in five consensus cluster solutions for every weight value, thus the overall number of solutions totaling to 20.

  • In each solution, clusters were additionally augmented with infrequent words in the dataset, every infrequent word being ascribed to the closest cluster. Thus each of 20 solutions was supplemented by a paired solution with augmented clusters.

The clustering results were evaluated on the heldout dataset as follows:

  • For every cluster solution, only the clusters that correlated with \(p < 0.05\) with SWLS or WHO-5 were used as features;

  • Each cluster feature was computed as the sum of the respective words’ TfIDF values;

  • The resulting features were used for RandomForest regression predicting SWLS and WHO-5 on the heldout dataset, with 10-fold train/test cross-validation and recursive feature elimination;

  • The best cluster features were chosen by Mean Average Error (MAE) of the regression models trained on the heldout dataset; later they were used for prediction on the final dataset.

The main parameters of the resulting feature sets are described in Table 5.

Table 5 Best word cluster features

2.3.3 Phone app categories and usage features

The phone app categories and usage features are based on the 1-week phone app usage history shared by the participants. App categories, or types were obtained from the phone app dataset data by using 53 app categories generated automatically from 28K app descriptions and by manually uniting them into larger groups as described in [47, 49]. As a result, we identified the following nine app categories: Game, Education+Productivity, Tools, Entertainment, Personalization, Health+Medical, Social+Communication+Dating, Photography, covering 21.5K apps, with the rest 6.5K apps having been assigned to Other. The main app usage features were calculated as the total time devoted to a certain app category (e.g. Game, Photography or Other) in each of eight three-hour time slots of a day, averaged over all days of a given user (\(9*8 = 72\) features), as well as overall time spent for this category in the entire app usage history of an individual (9 features). Next, we constructed several normalized versions of each feature. Namely, we normalized them by the total app usage time in this category, and by the total app usage logged in the current three-hour period. This resulted in \(9 + 72*3=225\) features. The phone app category features are exemplified in Table 6.

Table 6 Phone app category features

2.4 Machine learning experiments

We performed specific experiments for each of our two subtasks: prediction of satisfaction-related and mental well-being scales and prediction of the classes in the latter. As we aimed at interpretable results, our main experiments were based on classical regressions. Simultaneously, to make sure that we obtain the best possible prediction quality with the available contemporary methods, we also carried out extensive experiments employing deep learning approaches (described in Appendix 4). However, they yielded inferior results. The two main possible reasons for that are the following (1) our data are hard to obtain, and the obtained data are sparse and loosely intersect between users, which reduces the sample significantly; (2) our message data is hierarchically organized, with numerous alters with whom every participant communicates and numerous messages sent to every alter, while additionally the number of alters and messages highly varies between the participants/alters (see Table 3 above).

Our experiment on prediction of SWLS and WHO-5 scales was performed using a 10-fold cross-validation design with train, development and test sets (298/37/37 users, 80/10/10%). The non-overlapping train, development and test sets were constructed as follows:

  1. 1.

    The sample was shuffled and sorted by the well-being values;

  2. 2.

    The sorted sample was divided into 10 bins containing 37 users each so that \(\mathrm{bin}_{i}\) consisted of users with \(\mathrm{index} = i + K*37\), where K varied in the range \([0; 36]\). Thus every bin was equally distributed in terms of the SWB values.

  3. 3.

    For ith cross-validation fold, \(\mathrm{bin}_{i}\) was used as the test set, \(\mathrm{bin}_{i+1}\) – as the dev set, and the remaining users belonged to the training set.

Our evaluation metrics for regression include Mean Absolute Error (MAE), Pearson r and R2-score. Hyperparameter values were chosen inside the cross-validation loop based on the results obtained from development by MAE values. Recursive Feature Elimination (RFE) was performed based on the development set to identify the informative features in each cross-validation fold. RFE was adopted based on the earlier experiments which had shown the increase in model performance with RFE. Additionally, RFE allows to select a small number of informative features, improving the model interpretability. The selected best hyperparameters and features were used to evaluate the quality of prediction on the test set inside the cross-validation loop. In the end, the evaluation metrics were averaged across all 10 folds.

Predictions of SWLS and WHO-5 scores were performed with seven regression models, including Linear Regression with various regularization techniques, Decision Tree, and two ensemble methods (see Appendix 5). WHO-5 classification was performed with three classification models based on our preliminary experiments (Appendix 6).

Classification of individual WHO-5 levels was performed in a binary mode with two classes (low VS high well-being) and in a trinary mode with three classes (low VS medium VS extremely high). The models and hyperparameter values are described in Appendix 6. We report F1-macro and F1-weighted metrics over all the classes, as well as F1 metric for the lowest and the highest classes separately. We additionally report True Positive and False Positive Rates for the low well-being class, as these measures are typically used for screening test of various mental health conditions (cf. [38]).

All the calculations were performed in python with pandas, scipy, and scikit-learn libraries.

3 Results

3.1 Prediction of well-being scale values

The continuous modeling results for the SWLS and WHO-5 well-being values are presented in Tables 7 and 8, respectively.

Table 7 SWLS value prediction results
Table 8 WHO-5 value prediction results

The results for every individual feature set, and for the best feature sets in terms of every evaluation metric are included; the best results are highlighted in bold. The full results for all the feature set combinations are presented in Appendices 7, 8.

Overall, the best feature set is words written by the users in messages, and the best model is ElasticNet.

3.2 Prediction of WHO-5 classes

The main classification results for the WHO-5 well-being are presented in Table 9. The full WHO-5 classification results are presented in Appendix 9.

Table 9 Best WHO-5 classification results

3.3 Significant features

The features in the best performing continuous models of satisfaction-related well-being (SWLS) and mental well-being (WHO-5) scales are illustrated in Tables 10 and 11. Only the features which were selected by RFE in at least five out of ten cross-validation folders are included; the features significant in both SWLS and WHO-5 regression are highlighted in bold. All the significant features are listed in Appendices 10, 11.

Table 10 Predictive features in SWLS scale. Slang, misspellings and unconventional word forms are shown with an asterisk (*). Errors in lemmatization are enclosed in brackets
Table 11 Predictive features in WHO-5 scale

4 Discussion

In this paper, we have introduced a novel task of predicting mental well-being measured by WHO-5 index, as compared to traditionally studied satisfaction-related SWLS, with digital traces, and performed it in both continuous modeling and classification designs. In the latter, we have shown that the selected WHO-5 thresholds are representative of a range of three mental well-being-related conditions (depression, anxiety and stress) with high sensitivity and specificity. Furthermore, the results obtained in mental well-being classification are highly promising (0.792 True Positive Rate and 0.404 False Positive Rate) in the binary task with our highly sensitive threshold. This threshold is very close to the one recommended by WHO for moderate depression screening (0.51 against 0.50). The classification result itself is similar to the performance of the best existing models that predict other mental conditions with digital traces [30, 38]. Likewise, our results of SWLS and WHO-5 scale prediction, with Pearson \(r = 0.402\) and 0.367, respectively, improve the state-of-the-art metrics reported previously in similar tasks with cross-validation designs [51, 53]. Since, as mentioned earlier, prediction of internal states with observable behaviors has its limitations [29, 30], the obtained correlation may be considered high. As a result, we obtain a model which is highly sensitive and sufficiently specific for identifying low levels of subjective well-being requiring intervention in a high-risk population of mental health application users. Our model is unique not only in its accurate prediction of WHO-5 classes that have a proven ability of depression risk detection, but also in its potential to develop into a tool for broader screening for mental health risks, not limited to specific conditions reported in previous studies (see [28, 30, 48] for an overview).

We have performed a unique comparison of regression models predicting both SWLS and WHO-5 indices on the same sample. Our best models for both indices show similar performance in terms of correlation and R2 metrics, but WHO-5 is predicted better in terms of MAE across all feature combinations; however, this is likely an outcome of different distributions of SWLS and WHO-5 in our sample (see Fig. 1, 2, Table 1 above).

Figure 2
figure 2

Distribution of WHO-5 values

Figure 3
figure 3

Distribution of Age values

Our design also allows us to compare the features predictive of life satisfaction-related SWB and mental SWB. Although our experiments have revealed only two highly predictive features that are common for both SWLS and WHO-5, they are highly interpretable in terms of psychological theory. These two metrics are (1) phone app usage time between 9 and 12 AM normalized by total app usage time, and (2) negative sentiment expressed in private messages in the last month, which have positive and negative coefficients, respectively, in both SWLS and WHO-5 tasks. Both of these findings confirm previous results obtained in various populations: participants affected by depression and other low SWB conditions have been found less likely than average individuals to participate in online activities in the morning hours around 9–10 AM [74, 75], while their circadian rhythms have been often disrupted [7]. Such disruption is what usually accompanies insomnia or hypersomnia, a symptom of the major depressive disorder listed in DSM-5 [83], the Diagnostic and Statistical Manual of Mental Disorders developed by the American Psychological Association.

Negative sentiment has been shown to correlate negatively with life satisfaction [34, 53, 84] and subjective well-being [71]. Negative sentiment in written or oral speech may also sometimes, although not always, be a manifestation of depressed mood, another symptom of depressive disorder according to DMS-5.

Thus, these two highly predictive features intersecting in both SWLS and WHO-5 prediction models can indicate different degrees of SWB: from simple dissatisfaction with life, circumstances or personal achievements (relevant for SWLS), to a deterioration in mental or physical condition and serious symptoms of the depressive spectrum (relevant for WHO-5). They can be recommended for use across various SWB-prediction tasks.

Predictors unique for satisfaction-related well-being are much more dominated by verbal features related to affect-laden psychological and social content. They are often obscene lexemes, but also represent both negative and positive sentiment polarities (quit_VERB, spend_ VERB, fine_UNKN, explanation_NOUN, bully_VERB, spoil_VERB, gasp_ VERB, nice_COMPARATIVE). Association of positive lexica with SWB is consistent with Weismayer [85], who also finds negative relation of SWB with lexica expressing anger and fear. Some of our predictive words are likely to express these emotions (e.g. bully [rude], burn, lie [rude], gasp). Also, these lexica fit well with some of the ontologies developed for depression detection [45]. Prevalence of lexical features among SWLS predictors suggests that this index, indeed, captures subjective perception of well-being rather than symptoms of mental disorders, such as depression.

On the contrary, in mental well-being level prediction, phone app usage features take a clear lead, especially those related to the ratio of nighttime app usage (3–6 AM). Additionally, lexica related to biological processes are also a distinctive marker of low WHO-5 levels. All this aligns well with the primary goal of WHO-5 to reveal depression and its proved ability to differentiate between problematic mental health states and high levels of mental health-related well-being. Specifically, app usage rhythms and biological lexica are likely to be manifestations of such depression symptoms as increase or decrease in either weight or appetite, insomnia or hypersomnia, and fatigue or loss of energy [86]. At the same time, they can be markers of a poor physical condition, which is also detected by WHO-5 [18].

Finally, the significance of negative sentiment in the long periods of messaging (1 year and longer) for WHO-5 levels suggests that mental SWB measured by this index might in fact have a more stable behavioral pattern than SWLS. However, there is also a possibility that the stable component of SWLS is underrepresented in our features or subjects. Simultaneously, it may be that not only SWLS (as shown in [21]), but also WHO-5 contains both stable and transient components that may be explained by different factors. While the temporal stability of SWB may be expected to be related to constant individual features, such as presence of a chronic disease, SWB volatility, on the contrary, should be explained by short-term mood fluctuations and long-term meaningful changes in life, such as those listed in the introduction. Individual predictors of SWB stability and volatility may differ for SWLS and WHO-5, and it may happen that in our sample the feature set is skewed in favor of WHO-5 stability factors. In any case, our analysis of the overlapping and the differing predictors for WHO-5 and SWLS shows that satisfaction-related SWB and mental SWB share some of their transient factors rather than stable ones. These preliminary observations of the temporal dimension of SWB set a promising direction for future research.

5 Conclusions

The growing interest in tracking human mental states and in the development of mindfulness leads to the growth of applications that screen or even diagnose mental conditions and offer solutions for their correction, including those based on objective data. Our research has shown that it is possible to create machine learning models based on interpretable traits and predict various aspects of subjective well-being at the state-of-the-art level.

In doing so, we have performed the first study on predicting subjective well-being measured by WHO-5. We have demonstrated that certain WHO-5 level thresholds are indicative of a range of mental health conditions prevalent in a sample characterized by high risk of mental health problems. We have obtained promising results in classification of mental SWB into classes constructed based on these thresholds. This approach has allowed us to identify individuals affected by low subjective well-being with high recall and reasonable false positive rates, based on their digital traces.

Our study is also the first to compare prediction performance and predictive features of mental SWB and satisfaction-related SWB. We show that several predictors are shared by well-being measured by both WHO-5 and SWLS, and these digital traces are bluntly indicative of overall (un)well-being. At the same time, digital traces distinguishing between WHO-5 and SWLS are closely related to the conceptual difference between these two indices: while SWLS is characterized by expressions denoting affect-laden psychological and social content, WHO-5 levels are manifested in objective features reflecting physiological functioning and somatic conditions, i.e., lexica related to biological processes and circadian rhythm-related ratios of phone app usage.

To our knowledge, this is the first approach to subjective well-being prediction in a Russian-speaking population, and the first to combine language, social network and phone app usage features in well-being research. By leveraging phone app usage logs, profile and message data from the Russian social network VKontakte, we have been able to improve prediction of satisfaction-related SWB (SWLS) and propose a first predictive model for mental SWB (WHO-5). At the same time, as our sample has been very small and limited to a high-risk population, the study needs replication on larger samples representative of wider social and psychological groups. The major obstacle to this is that VKontakte private message data are no longer available for any type of download, while other social media are even more restrictive. Development of public policies and regulations encouraging private data-collecting companies to share portions of their data for public good purposes is highly recommended.

Availability of data and materials

The data that support the findings of this study belong to the Humanteq company and were collected under specific terms of use. When onboarding in the Digital Freud app, users agreed to a privacy policy that explicitly prohibited data transfer to third parties, largely because the amount of data for each user does not allow completely anonymizing the dataset and contains sensitive information. Therefore restrictions apply to the availability of these data, which is why they are not publicly available. A fraction of the data can however be obtained from the authors upon reasonable request and with permission of the Humanteq company. The code for data analysis is available at https://github.com/hse-scila/bewell.

Notes

  1. https://github.com/dlatk/happierfuntokenizing.

  2. https://arxiv.org/abs/1804.10742, the code https://github.com/Kipok/clr_prediction was modified and applied.

  3. https://naeglelab.github.io/OpenEnsembles/_modules/finishing.html#majority_vote

Abbreviations

AUC:

Area Under the Curve

DF:

DigitalFreud

DSM-5:

Diagnostic and Statistical Manual of mental disorders, fifth edition

GAD:

General Anxiety Disorder scale

PSS:

Perceived Stress Scale

LIWC:

Linguistic Inquiry & Word Count dictionary

MAE:

Mean Absolute Error

NLP:

Natural Language Processing

PHQ-9:

Patient Health Questionnaire

RFE:

Recursive Feature Elimination

RMSE:

Root-Mean-Square Error

RuLIWC:

Russian Linguistic Inquiry & Word Count

RuSentiLex:

Russian Sentiment Lexicon

SWB:

Subjective Well-Being;

SWLS:

Diener’s Satisfaction with Life Scale

TFIDF:

Term Frequency Inverse Document Frequency

WHO-5:

World Health Organization-5 Well-Being Index

References

  1. Linton M-J, Dieppe P, Medina-Lara A (2016) Review of 99 self-report measures for assessing well-being in adults: exploring dimensions of well-being and developments over time. BMJ Open 6(7):010641

    Google Scholar 

  2. Goodday SM, Geddes JR, Friend SH (2021) Disrupting the power balance between doctors and patients in the digital era. Lancet Digit Health 3(3):142–143

    Google Scholar 

  3. Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJ (2006) Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet 367(9524):1747–1757

    Google Scholar 

  4. Barzilay R, Moore TM, Greenberg DM, DiDomenico GE, Brown LA, White LK, Gur RC, Gur RE (2020) Resilience, Covid-19-related stress, anxiety and depression during the pandemic in a large population enriched for healthcare providers. Transl Psychiatry 10(1):1–8

    Google Scholar 

  5. Wilke J, Hollander K, Mohr L, Edouard P, Fossati C, González-Gross M, Sánchez Ramírez C, Laiño F, Tan B, Pillay JD et al. (2021) Drastic reductions in mental well-being observed globally during the Covid-19 pandemic: results from the asap survey. Front Med 8:246

    Google Scholar 

  6. Pieh C, Budimir S, Delgadillo J, Barkham M, Fontaine JR, Probst T (2021) Mental health during Covid-19 lockdown in the United Kingdom. Psychosom Med 83(4):328–337

    Google Scholar 

  7. Rohani DA, Faurholt-Jepsen M, Kessing LV, Bardram JE (2018) Correlations between objective behavioral features collected from mobile and wearable devices and depressive mood symptoms in patients with affective disorders: systematic review. JMIR mHealth uHealth 6(8):165

    Google Scholar 

  8. Devakumar A, Modh J, Saket B, Baumer EP, De Choudhury M (2021) A review on strategies for data collection, reflection, and communication in eating disorder apps. In: Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–19

    Google Scholar 

  9. Huang Y-N, Zhao S, Rivera ML, Hong JI, Kraut RE (2021) Predicting well-being using short ecological momentary audio recordings. In: Extended abstracts of the 2021 CHI conference on human factors in computing systems, pp 1–7

    Google Scholar 

  10. Diener E, Emmons RA, Larsen RJ, Griffin S (1985) The satisfaction with life scale. J Pers Assess 49(1):71–75

    Google Scholar 

  11. World Health Organization et al (1998) Wellbeing measures in primary health care: the depcare project: report on a who meeting, Stockholm, Sweden, pp 12–13

  12. Bech P, Olsen LR, Kjoller M, Rasmussen NK (2003) Measuring well-being rather than the absence of distress symptoms: a comparison of the sf-36 mental health subscale and the who-five well-being scale. Int J Methods Psychiatr Res 12(2):85–91

    Google Scholar 

  13. McDowell I (2010) Measures of self-perceived well-being. J Psychosom Res 69(1):69–79

    Google Scholar 

  14. Diener E, Inglehart R, Tay L (2013) Theory and validity of life satisfaction scales. Soc Indic Res 112(3):497–527

    Google Scholar 

  15. Sischka PE, Costa AP, Steffgen G, Schmidt AF (2020) The who-5 well-being index–validation based on item response theory and the analysis of measurement invariance across 35 countries. J Affective Disorders Reports 1:100020

    Google Scholar 

  16. Downs A, Boucher LA, Campbell DG, Polyakov A (2017) Using the who-5 well-being index to identify college students at risk for mental health problems. J Coll Stud Dev 58(1):113–117

    Google Scholar 

  17. Kusier AO, Folker AP (2020) The well-being index who-5: hedonistic foundation and practical limitations. Med Humanit 46(3):333–339

    Google Scholar 

  18. Kusier AO, Folker AP (2021) The satisfaction with life scale: philosophical foundation and practical limitations. Health Care Anal 29(1):21–38

    Google Scholar 

  19. Helliwell JF, Layard R, Sachs J, De Neve J-E (2020) World happiness report 2020. Sustainable Development Solutions Network, New York

    Google Scholar 

  20. Luhmann M, Lucas RE, Eid M, Diener E (2013) The prospective effect of life satisfaction on life events. Soc Psychol Pers Sci 4(1):39–45

    Google Scholar 

  21. Pavot W, Diener E (2009) Review of the satisfaction with life scale. In: Diener E (ed) Assessing well-being. Springer, Dordrecht, pp 101–117. https://doi.org/10.1007/978-90-481-2354-4_5

    Chapter  Google Scholar 

  22. Blom EH, Bech P, Högberg G, Larsson JO, Serlachius E (2012) Screening for depressed mood in an adolescent psychiatric context by brief self-assessment scales–testing psychometric validity of who-5 and bdi-6 indices by latent trait analyses. Health Qual Life Outcomes 10(1):1–6

    Google Scholar 

  23. Krieger T, Zimmermann J, Huffziger S, Ubl B, Diener C, Kuehner C, Holtforth MG (2014) Measuring depression with a well-being index: further evidence for the validity of the who well-being index (who-5) as a measure of the severity of depression. J Affect Disord 156:240–244

    Google Scholar 

  24. Topp CW, Østergaard SD, Søndergaard S, Bech P (2015) The who-5 well-being index: a systematic review of the literature. Psychother Psychosom 84(3):167–176

    Google Scholar 

  25. Chouchou F, Augustini M, Caderby T, Caron N, Turpin NA, Dalleau G (2021) The importance of sleep and physical activity on well-being during Covid-19 lockdown: reunion island as a case study. Sleep Med 77:297–301

    Google Scholar 

  26. Brindal E, Ryan JC, Kakoschke N, Golley S, Zajac IT, Wiggins B (2021) Individual differences and changes in lifestyle behaviours predict decreased subjective well-being during Covid-19 restrictions in an Australian sample. J Public Health

  27. Gierc M, Riazi NA, Fagan MJ, Di Sebastiano KM, Kandola M, Priebe CS, Weatherson KA, Wunderlich KB, Faulkner G (2021) Strange days: adult physical activity and mental health in the first two months of the Covid-19 pandemic. Front. Public Health 9:325

    Google Scholar 

  28. Settanni M, Azucar D, Marengo D (2018) Predicting individual characteristics from digital traces on social media: a meta-analysis. Cyberpsychol Behav Soc Netw 21(4):217–228

    Google Scholar 

  29. Meyer GJ, Finn SE, Eyde LD, Kay GG, Moreland KL, Dies RR, Eisman EJ, Kubiszyn TW, Reed GM (2001) Psychological testing and psychological assessment: a review of evidence and issues. Am Psychol 56(2):128–165

    Google Scholar 

  30. Guntuku SC, Yaden DB, Kern ML, Ungar LH, Eichstaedt JC (2017) Detecting depression and mental illness on social media: an integrative review. Curr Opin Behav Sci 18:43–49

    Google Scholar 

  31. Novikov P, Mararitsa L, Nozdrachev V (2021) Inferred vs traditional personality assessment: are we predicting the same thing? arXiv preprint. 2103.09632

  32. Guntuku SC, Lin W, Carpenter J, Ng WK, Ungar LH, Preoţiuc-Pietro D (2017) Studying personality through the content of posted and liked images on Twitter. In: Proceedings of the 2017 ACM on web science conference, pp 223–227

    Google Scholar 

  33. Bech P (2012) Subjective positive well-being. World Psychiatry 11(2):105–106

    Google Scholar 

  34. De Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Seventh international AAAI conference on weblogs and social, Media

    Google Scholar 

  35. Coppersmith G, Dredze M, Harman C, Hollingshead K, Mitchell M (2015) Clpsych 2015 shared task: depression and ptsd on Twitter. In: Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 31–39

    Google Scholar 

  36. Preoţiuc-Pietro D, Eichstaedt J, Park G, Sap M, Smith L, Tobolsky V, Schwartz HA, Ungar L (2015) The role of personality, age, and gender in tweeting about mental illness. In: Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 21–30

    Google Scholar 

  37. Tsugawa S, Kikuchi Y, Kishino F, Nakajima K, Itoh Y, Ohsaki H (2015) Recognizing depression from Twitter activity. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, pp 3187–3196

    Google Scholar 

  38. Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preoţiuc-Pietro D, Asch DA, Schwartz HA (2018) Facebook language predicts depression in medical records. Proc Natl Acad Sci 115(44):11203–11208

    Google Scholar 

  39. Coppersmith G, Dredze M, Harman C (2014) Quantifying mental health signals in Twitter. In: Proceedings of the workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 51–60

    Google Scholar 

  40. Coppersmith G, Ngo K, Leary R, Wood A (2016) Exploratory analysis of social media prior to a suicide attempt. In: Proceedings of the third workshop on computational linguistics and clinical psychology, pp 106–117

    Google Scholar 

  41. Benton A, Mitchell M, Hovy D (2017) Multitask learning for mental health conditions with limited social media data. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 1, long papers, pp 152–162

    Google Scholar 

  42. Uban A-S, Chulvi B, Rosso P (2021) An emotion and cognitive based analysis of mental health disorders from social media data. Future Gener Comput Syst 124:480–494

    Google Scholar 

  43. Lee Y-K, Chang C-T, Lin Y, Cheng Z-H (2014) The dark side of smartphone usage: psychological traits, compulsive behavior and technostress. Comput Hum Behav 31:373–383

    Google Scholar 

  44. Sheldon P, Rauschnabel P, Honeycutt JM (2019) The dark side of social media: psychological, managerial, and societal perspectives. Academic Press, San Diego

    Google Scholar 

  45. Hung GC-L, Yang P-C, Chang C-C, Chiang J-H, Chen Y-Y (2016) Predicting negative emotions based on mobile phone usage patterns: an exploratory study. JMIR Res Protoc 5(3):160

    Google Scholar 

  46. Saeb S, Zhang M, Karr CJ, Schueller SM, Corden ME, Kording KP, Mohr DC (2015) Mobile phone sensor correlates of depressive symptom severity in daily-life behavior: an exploratory study. J Med Internet Res 17(7):175

    Google Scholar 

  47. Stachl C, Au Q, Schoedel R, Gosling SD, Harari GM, Buschek D, Völkel ST, Schuwerk T, Oldemeier M, Ullmann T et al. (2020) Predicting personality from patterns of behavior collected with smartphones. Proc Natl Acad Sci 117(30):17680–17687

    Google Scholar 

  48. Luhmann M (2017) Using big data to study subjective well-being. Curr Opin Behav Sci 18:28–33

    Google Scholar 

  49. David ME, Roberts JA, Christenson B (2018) Too much of a good thing: investigating the association between actual smartphone use and individual well-being. Int J Hum-Comput Interact 34(3):265–275

    Google Scholar 

  50. Kosinski M, Stillwell D, Graepel T (2013) Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci 110(15):5802–5805

    Google Scholar 

  51. Collins S, Sun Y, Kosinski M, Stillwell D, Markuzon N (2015) Are you satisfied with life?: predicting satisfaction with life from Facebook. In: International conference on social computing, behavioral-cultural modeling, and prediction. Springer, Berlin, pp 24–33

    Google Scholar 

  52. Schwartz HA, Sap M, Kern ML, Eichstaedt JC, Kapelner A, Agrawal M, Blanco E, Dziurzynski L, Park G, Stillwell D et al. (2016) Predicting individual well-being through the language of social media. In: Biocomputing 2016: proceedings of the Pacific symposium. World Scientific, Singapore, pp 516–527

    Google Scholar 

  53. Chen L, Gong T, Kosinski M, Stillwell D, Davidson RL (2017) Building a profile of subjective well-being for social media users. PLoS ONE 12(11):0187278

    Google Scholar 

  54. Linnhoff S, Smith KT (2017) An examination of mobile app usage and the user’s life satisfaction. J Strat Mark 25(7):581–617

    Google Scholar 

  55. Gao Y, Li H, Zhu T (2014) Predicting subjective well-being by smartphone usage behaviors. In: HEALTHINF, pp 317–322

    Google Scholar 

  56. StatCounter Global Stats (2018) Mobile operating system market share. Russia. https://gs.statcounter.com/os-market-share/mobile/russian-federation/2018

  57. Statista (2021) Number of smartphone users in Russia from 2015 to 2025. https://www.statista.com/statistics/467166/forecast-of-smartphone-users-in-russia/

  58. Region Hovedstadens Psykiatriske Hospital (2021) Индекс общего (хорошего) самочувствия/ВОЗ (вариант 1999 г). https://www.psykiatri-regionh.dk/who-5/Documents/WHO5_Russian.pdf

  59. Korobov M (2015) Morphological analyzer and generator for Russian and Ukrainian languages. In: International conference on analysis of images, social networks and texts. Springer, Berlin, pp 320–332

    Google Scholar 

  60. Kroenke K, Spitzer RL, Williams JB (2001) The phq-9: validity of a brief depression severity measure. J Gen Intern Med 16(9):606–613

    Google Scholar 

  61. Spitzer RL, Kroenke K, Williams JB, Löwe B (2006) A brief measure for assessing generalized anxiety disorder: the gad-7. Arch Intern Med 166(10):1092–1097

    Google Scholar 

  62. Cohen S, Kamarck T, Mermelstein R et al. (1994) Perceived stress scale. Meas Stress: Guide Health Soc Sci 10(2):1–2

    Google Scholar 

  63. Ababkov VA, Barisnikov K, Vorontzova-Wenger OV, Gorbunov IA, Kapranova SV, Pologaeva EA, Stuklov KA (2016) Validation of the Russian version of the questionnaire “Scale of perceived stress-10”. Vestn Saint-Petersburg Univ Psychol Educ 16(2):6–15

    Google Scholar 

  64. Ledovaya YA, Bogolyubova ON, Tikhonov RV (2015) Stress, well-being and the Dark Triad. Psikhologicheskie Issled 8(43):5

    Google Scholar 

  65. Bonnín CM, Yatham LN, Michalak EE, Martínez-Arán A, Dhanoa T, Torres I, Santos-Pascual C, Valls E, Carvalho AF, Sánchez-Moreno J, Valentí M, Grande I, Hidalgo-Mazzei D, Vieta E, Reinares M (2018) Psychometric properties of the well-being index (who-5) Spanish version in a sample of euthymic patients with bipolar disorder. J Affect Disord 228:153–159. https://doi.org/10.1016/j.jad.2017.12.006

    Article  Google Scholar 

  66. Schougaard L, de Thurah A, Bech P, Hjollund N, Christiansen D (2018) Test-retest reliability and measurement error of the Danish who-5 well-being index in outpatients with epilepsy. Health Qual Life Outcomes 16(1):175. https://doi.org/10.1186/s12955-018-1001-0

    Article  Google Scholar 

  67. Brailovskaia J, Schönfeld P, Zhang XC, Bieda A, Kochetkov Y, Margraf J (2018) A cross-cultural study in Germany, Russia, and China: are resilient and social supported students protected against depression, anxiety, and stress? Psychol Rep 121(2):265–281. https://doi.org/10.1177/0033294117727745. PMID: 28836915

    Article  Google Scholar 

  68. Spitzer R, Williams J, Kroenke K (1990) Instruction manual: instructions for patient health questionnaire (phq) and gad-7 measures. PHQ and GAD-7 instructions

  69. Pronoza E, Panicheva P, Koltsova O, Rosso P (2021) Detecting ethnicity-targeted hate speech in Russian social media texts. Inf Process Manag 58(6):102674

    Google Scholar 

  70. Rosstat (2017) The Demographic yearbook of Russia. 2017: statistical handbook. Rosstat, Moscow. (In Russ.)

    Google Scholar 

  71. Bogolyubova O, Panicheva P, Ledovaya Y, Tikhonov R, Yaminov B (2020) The language of positive mental health: findings from a sample of Russian Facebook users. SAGE Open 10(2):2158244020924370

    Google Scholar 

  72. Brailovskaia J, Schönfeld P, Kochetkov Y, Margraf J (2019) What does migration mean to us? Usa and Russia: relationship between migration, resilience, social support, happiness, life satisfaction, depression, anxiety and stress. Curr Psychol 38(2):421–431

    Google Scholar 

  73. Bieda A, Hirschfeld G, Schönfeld P, Brailovskaia J, Zhang XC, Margraf J (2017) Universal happiness? Cross-cultural measurement invariance of scales assessing positive mental health. Psychol Assess 29(4):408–421

    Google Scholar 

  74. Birnbaum ML, Wen H, Van Meter A, Ernala SK, Rizvi AF, Arenare E, Estrin D, De Choudhury M, Kane JM (2020) Identifying emerging mental illness utilizing search engine activity: a feasibility study. PLoS ONE 15(10):0240820

    Google Scholar 

  75. Ten Thij M, Bathina K, Rutter LA, Lorenzo-Luaces L, van de Leemput IA, Scheffer M, Bollen J (2020) Depression alters the circadian pattern of online activity. Sci Rep 10(1):1–10

    Google Scholar 

  76. Loukachevitch N, Levchik A (2016) Creating a general Russian sentiment lexicon. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16), pp 1171–1176

    Google Scholar 

  77. Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, Shah A, Kosinski M, Stillwell D, Seligman ME et al. (2013) Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9):73791

    Google Scholar 

  78. Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of LIWC2015. Technical report, The University of Texas at Austin

  79. Panicheva P, Litvinova T (2020) Matching liwc with Russian thesauri: an exploratory study. In: Conference on artificial intelligence and natural language. Springer, Berlin, pp 181–195

    Google Scholar 

  80. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint. 1301.3781

  81. Shavrina T, Shapovalova O (2017) To the methodology of corpus construction for machine learning: «taiga» syntax tree corpus and parser. In: Proceedings of the corpora, pp 78–84

    Google Scholar 

  82. Kutuzov A, Kuzmenko E (2016) Webvectors: a toolkit for building web interfaces for vector semantic models. In: International conference on analysis of images, social networks and texts. Springer, Berlin, pp 155–161

    Google Scholar 

  83. American Psychiatric Association et al. (2013) Diagnostic and statistical manual of mental disorders: DSM-5. Am. Psychiat. Assoc., Washington

    Google Scholar 

  84. Wang N, Kosinski M, Stillwell D, Rust J (2014) Can well-being be measured using Facebook status updates? Validation of Facebook’s Gross national happiness index. Soc Indic Res 115(1):483–491

    Google Scholar 

  85. Weismayer C (2021) Investigating the affective part of subjective well-being (swb) by means of sentiment analysis. Int J Soc Res Methodol 24(6):697–712

    Google Scholar 

  86. Fried EI, Nesse RM (2015) Depression is not a consistent syndrome: an investigation of unique symptom patterns in the STAR* D study. J Affect Disord 172:96–102

    Google Scholar 

Download references

Acknowledgements

This work is the result of the collaboration with Paolo Rosso in the framework of his virtual online internship at the National Research University Higher School of Economics due to COVID-19.

This research was supported in part through computational resources of HPC facilities at NRU HSE.

Funding

The study was implemented in the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE University).

Author information

Authors and Affiliations

Authors

Contributions

PP: participated in hypothesis formulation, engineered all features, ran the main predictive models and prepared the initial draft. LM: collected all the data, formulated the initial research problem and edited the manuscript from the psychological point of view. SS: ran all experiments with neural networks and prepared the Appendix. OK: participated in hypothesis formulation, curated dataset formation and feature engineering, ran several exploratory models, provided major editing of the final draft. PR: provided the overall research design, suggested methods and approaches, participated in editing of the final draft. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Olessia Koltsova.

Ethics declarations

Ethics approval and consent to participate

The study has been approved by the Higher School of Economics Committee on Interuniversity Surveys and Ethical Assessment of Empirical Research.

Competing interests

The authors declare that they have no competing interests.

Appendices

Appendix 1: Data preprocessing

The dataset was cleaned:

  • First, birth year and gender were identified from DF and VK profile data. We removed data where age or gender were not available, or contradicted between DF and VK;

  • We only selected users having non-zero information on phone app usage, number of VK friends and at least 100 characters written in messages in the month immediately prior to DF installation. This left us with user 446 accounts;

  • We removed duplicates of VK and DF id from the data, giving priority to the data profiles which included more information filled in, and to profiles which were characterized by a later DF install time.

  • The resulting final dataset contained 372 unique users with SWLS, WHO-5 and digital traces information.

Appendix 2: Distribution of the subjective well-being and demographic data in the final dataset

See Figs. 14.

Figure 4
figure 4

Distribution of Gender values

2.1 2.1 Demographic data description

Total sparse data sample includes information about 1960 users. 17.6% of them (344) do not provide information about the country. The rest of the sample (1616 users) contains 84.4% of Russian users, 5% – Belarus, 2.4% – Ukraine, 1.9% – Kazakhstan, 1.1% – USA. We also have users from the countries below, but their frequencies are not higher than 1%: Japan, South Korea, Moldova, Germany, Great Britain, Canada, Finland, Kyrgyzstan, Italy, Argentina, Norway, Israel, Cyprus, China, Vatican, Honduras, India, Serbia, Latvia, Liechtenstein, Iceland, Uzbekistan, Hungary, Georgia, Denmark, France, Ivory Coast, Cook Islands, Estonia, Australia, Romania, Netherlands, American Samoa, Albania, Gambia. Information about the city of the current sample is absent in 30% of cases (578), but the distribution of ten most common cities for the rest of the users (1382) is described in Table 12.

Table 12 Distribution of the most common cities identified in the overall data sample

The final dataset contains 372 users: 15% of them (57) do not provide information about the country. The rest of the sample (315 users) contains 92% of Russian users, 2.5% – Belarus, and 1.3% – Ukraine. There are also users from the countries below (each has less than 1%): Kazakhstan, USA, Latvia, China, Finland, Norway, Japan, Hungary, South Korea, Canada. Information about the city of the current sample is absent in 26% of cases (98), but the distribution of ten most common cities for the rest of the users (274) is described in Table 13.

Table 13 Distribution of the most common cities identified in the final dataset

Appendix 3: Word features

See Table 14.

Table 14 Total list of words used as features for the SWLS and WHO-5 prediction

Appendix 4: Preliminary deep learning experiments

4.1 4.1 RuBERT

First, we performed experiments with RuBERT models [Kuratov & Arkhipov 2019] based on post and message data. The difficulty in applying BERT-like models in our textual data lies in the fact that BERT model input is limited with max. 512 sub-tokens; at the same time, posts and messages in Vkontakte can be much longer and don’t have a small character limit (as it is, for example, in Twitter). This results in 2 issues, which have to be solved to apply RuBERT to our data:

  • input sequences should be truncated to 512 sub-tokens maximum;

  • input sequences by the same user should be aggregated.

Solving these issues is not a trivial task for VKontakte posts and messages for the following reasons:

  • Posts and messages have different length, they can be much longer than 512 sub-tokens;

  • The numbers of posts, messages and message alters for every user vary a lot;

  • The rhythm of posting/messaging varies a lot for every user: while active during one month, a user can have no posts or messages written in the previous 6 months;

Post/message information aggregation involves pooling of the individual RuBERT model results, which means basically averaging information between the range of posts/messages by a user, whereas a lot of information is lost. Due to these reasons, we performed most of our RuBERT-based experiments with posts, which, due to their smaller numbers, are easier to aggregate in the RuBERT models. We used data by 902 users with at least 10 posts. We fed each post into one of the RuBERT models [Kuratov & Arkhipov 2019] after truncation. After the RuBERT model, we used a variety of additional layers. Regression was always performed by the final Dense layer. The experiment hyperparameters included the following:

  • Using RuBERT as an embedding layer or fine-tuning it for the regression task;

  • The models included: RuBERT, Conversational RuBERT, Sentence RuBERT;

  • We included all users (902), and those having at least 50 messages (222);

  • We used the train/dev/test 5-fold cross validation;

  • We included up to 64 posts by each user truncated to 128 sub-tokens each;

  • We also aggregated the latest posts by each user and truncated the result to 512 sub-tokens;

  • We used the full RuBERT output or the last ‘class’ token;

The layers after the RuBERT models were:

  • Dense;

  • LSTM+Dense;

  • LSTM+Dense+Dense;

  • LSTM+LSTM+Dense;

In LSTM layers, the number of units ranged in \([8; 16; 64; 100]\); Dropout rate \(= [0., 0.1, 0.3, 0.5]\), optimizers = [RMSprop, Adagrad], learningrate \(= [0.0001, 0.001, 0.005, 0.01, 0.05]\), activation = [linear, relu, sigmoid], batch size = 128, epochs = 100, metrics = [mse], early stopping on validation MSE with patience = 10. Unfortunately, the results of these experiments were highly unstable, with MSE values not exceeding the dummy baseline (standard deviation of the sample), and Pearson R reaching 0.1.

4.1.1 4.1.1 Sentiment analysis with RuSentiment BERT

As it was mentioned before, Chen et al. [2016] used sentiment analysis to predict SWLS; we also performed experiments with user messages to assess sentiment. The idea is that distribution over sentiment classes can be used as features for predicting subjective well-being levels. Their many different approaches to classifying messages by sentiment. One of them is to use word dictionaries with sentiment marks. However, it has two important disadvantages: the sentiment of a word can be changed by the context of its use; it is not clear which label should we assign to messages with many words of different sentiment (especially if they are distributed evenly inside the message). These disadvantages lead us to use another common approach for sentiment classification. We used a pre-trained neural network. We found an open-source model with BERT architecture [Devlin J. et al., 2018] which was trained to define the sentiment of VKontakte posts. To be more precise, this model is a result of fine-tuning multilingual BERT with linear head on top using the RuSentiment dataset [Rogers A. et al., 2018] on five classes (“neutral”, “negative”, “positive”, “speech act”, “skip”) classification task.

Using a held-out dataset we subsample users who provide access to their messages. We created a dataset with around 400 users containing messages which were written by them in the last three months before they achieved a WHO score. By providing an exploratory data analysis we found that 10 per cent of users have less than 30 messages, so we cut off these samples. The resulting dataset has 354 samples where each user on average has 4,719 messages (median: 2,415). We normalize the frequency of each sentiment class using the overall number of messages corresponding to a user.

First, we check the correlation between the sentiment classes and WHO score. Table 15 shows that there is no strong correlation.

Table 15 Correlation between sentiment class and WHO score

We also construct a pipeline with a regression model on top of this frequency distribution with different feature combinations, but the models do not show promising results (Table 16).

Table 16 Results for linear regression model with sentiment class frequency features. Mean absolute error and Pearson correlation

We assume that achieved results can be explained in the following way. The domain of VKontakte message texts can be different from the domain of VKontakte post text. First, because posts can be interpreted as a complete (finite) phrase, but not a message, which should be interpreted inside the dialogue context. A separated message can have not enough information to classify its sentiment. The absence of dialogue boundaries (when a user starts one dialogue session and finishes it inside a long thread) does not allow us to reconstruct context for a message, which possibly can help to gain a more accurate sentiment classification.

Appendix 5: Models and hyperparameters used for SWLS and WHO-5 regression

See Table 17.

Table 17 Models and hyperparameters used for SWLS and WHO-5 regression

Appendix 6: Models and hyperparameters used for WHO-5 classification

See Table 18.

Table 18 Models and hyperparameters used for WHO-5 classification

Appendix 7: SWLS regression results for all feature sets

See Table 19.

Table 19 SWLS regression results for all feature sets

Appendix 8: WHO-5 regression results for all feature sets

See Table 20.

Table 20 WHO-5 regression results for all feature sets

Appendix 9: WHO-5 classification results

See Table 21.

Table 21 WHO-5 classification results

Appendix 10: Features significant in SWLS regression

See Table 22.

Table 22 Features significant in SWLS regression

Appendix 11: Features significant in WHO-5 regression

See Table 23.

Table 23 Features significant in WHO-5 regression

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panicheva, P., Mararitsa, L., Sorokin, S. et al. Predicting subjective well-being in a high-risk sample of Russian mental health app users. EPJ Data Sci. 11, 21 (2022). https://doi.org/10.1140/epjds/s13688-022-00333-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjds/s13688-022-00333-x

Keywords