Predicting subjective well-being in a high-risk sample of Russian mental health app users

Panicheva, Polina; Mararitsa, Larisa; Sorokin, Semen; Koltsova, Olessia; Rosso, Paolo

doi:10.1140/epjds/s13688-022-00333-x

Regular article
Open access
Published: 04 April 2022

Predicting subjective well-being in a high-risk sample of Russian mental health app users

Polina Panicheva¹,
Larisa Mararitsa^1,2,
Semen Sorokin¹,
Olessia Koltsova ORCID: orcid.org/0000-0002-2669-3154¹ &
…
Paolo Rosso³

EPJ Data Science volume 11, Article number: 21 (2022) Cite this article

4143 Accesses
5 Citations
2 Altmetric
Metrics details

Abstract

Despite recent achievements in predicting personality traits and some other human psychological features with digital traces, prediction of subjective well-being (SWB) appears to be a relatively new task with few solutions. COVID-19 pandemic has added both a stronger need for rapid SWB screening and new opportunities for it, with online mental health applications gaining popularity and accumulating large and diverse user data. Nevertheless, the few existing works so far have aimed at predicting SWB, and have done so only in terms of Diener’s Satisfaction with Life Scale. None of them analyzes the scale developed by the World Health Organization, known as WHO-5 – a widely accepted tool for screening mental well-being and, specifically, for depression risk detection. Moreover, existing research is limited to English-speaking populations, and tend to use text, network and app usage types of data separately. In the current work, we cover these gaps by predicting both mentioned SWB scales on a sample of Russian mental health app users who represent a population with high risk of mental health problems. In doing so, we employ a unique combination of phone application usage data with private messaging and networking digital traces from VKontakte, the most popular social media platform in Russia. As a result, we predict Diener’s SWB scale with the state-of-the-art quality, introduce the first predictive models for WHO-5, with similar quality, and reach high accuracy in the prediction of clinically meaningful classes of the latter scale. Moreover, our feature analysis sheds light on the interrelated nature of the two studied scales: they are both characterized by negative sentiment expressed in text messages and by phone application usage in the morning hours, confirming some previous findings on subjective well-being manifestations. At the same time, SWB measured by Diener’s scale is reflected mostly in lexical features referring to social and affective interactions, while mental well-being is characterized by objective features that reflect physiological functioning, circadian rhythms and somatic conditions, thus saliently demonstrating the underlying theoretical differences between the two scales.

1 Introduction

In recent years, evaluation, analysis and improvement of subjective well-being (SWB) has gained a growing attention of both researchers and practitioners [1, 2]. Attention to SWB has naturally been coupled with the increasing research interest in depression – the leading cause of disability and subjective well-being loss worldwide [3, 4]. The COVID-19 pandemic, resulting in the shift to hybrid work and the decline in face-to-face communication has put many individuals at additional mental health risks [5, 6]. Some of the most widely available instruments to mitigate such risks are online and mobile services that offer quick screening tests of subjective well-being and mental health states and automatically generate respective recommendations. More than 240 mental health apps are available in the App Store today, some of which are extensively using machine learning for classifying and scoring their users in terms of their psychological or mental conditions [7–9]. Such apps attract consumers concerned with their psychological states, while these concerns are usually associated with higher risks for users’ SWB or mental health. As these individuals agree to donate parts of their digital traces, psychological apps become natural hubs accumulating data on individuals at risk. Such data, if available, provide ample opportunities for the development of open source algorithms for early automatic detection of threats to well-being in high-risk populations with their digital traces.

Subjective well-being is most commonly defined in accordance with Diener’s approach [10] as a person’s satisfaction with their life (which constitutes SWB’s cognitive component) and the prevalence of positive emotions over negative ones (affective balance, which constitutes SWB’s affective component). To date, about 100 assessment tools measuring about 200 facets of well-being have been proposed, thus complicating the selection of relevant metrics [1]. The two most widely used SWB measurement tools are Diener’s Satisfaction with Life Scale (SWLS) [10] and the scale introduced by the World Health Organization in 1998, known as the WHO-5 index [11]. The former aims to capture generalized long-term subjective well-being, while the original goal of the latter was to screen and rate depression. Later, Bech, one of the WHO-5 developers, also showed that this scale is equally good at detecting high degrees of psychological well-being, which he proposed to consider a component of mental health, along with the absence of depression symptoms [12].

Both SWLS and WHO-5 are short unidimensional 5-item scales with proven validity and reliability (α coefficients 0.79–0.89 for the former and 0.82–0.95 for the latter) [13–15]. Both have become common for well-being screening in a wide range of populations and among different nationalities [15–18]. The wide use and the proven quality of these metrics defines their choice for our research in automatic SWB prediction; however, some more details on their distinctive features should be added.

SWLS, apart from being centered on pleasure and satisfaction, is also meant to be time- and dimension-independent. The first feature means that it is not tied to a specific time interval and measures satisfaction with our past, present and future. The second feature refers to the generalized character of such satisfaction, not being tied to any particular dimension of human life, such as health, relationships or finance. The choice of the dimensions to be taken into account and the weight assigned to them is left with the subject and is expected to be based on a blend of objective reality and the subject’s subjective experience of it. It is assumed that a person is able to adequately assess her well-being and has all the necessary and unbiased information for that [10].

SWLS is widely used by psychologists, public health professionals, and economists. According to the World Happiness Report, SWLS provides a more informative measure for international comparisons of well-being than some measures capturing affective component only [19]. Importantly, SWLS is stable under unchanging conditions, but is sensitive to changes in life circumstances: thus,its growth is associated with higher likelihood of marriage and childbirth and with lower likelihood of job loss and relocating [20]. It is also predictive of physical and physiological outcomes, as judged from a 4-year follow-up period in the same study. It is these meaningful changes that have been found responsible for the drop of SWLS test-retest reliability from 0.84 in the window of a few weeks to 0.54 in the 4-year window [21]. These changes are clearly distinct from the short-term random mood fluctuations responsible for explaining 16% of variance in the short run. It thus can said that SWLS captures a stable and a transient components both of which are present in human well-being.

In contrast to SWLS, WHO-5 index aims at a brief assessment of emotional well-being over a 14-day period (thus containing no cognitive component and being highly time-sensitive). Its items represent positive affect whose absence corresponds to the depression symptoms (negative affect). This is an important advantage of WHO-5 as the subjects are not forced to confess of the presence of any unpleasant and potentially hard-to-admit negative emotions or states. As mentioned above, WHO-5 has been proven effective for the detection of both depression risk [22, 23] and the high levels of well-being[12]. Being a short, sensitive, specific and non-invasive tool, it gains over more detailed, but heavier methods for preliminary depression and suicide risk assessment in settings without psychological/psychiatric expertise. WHO-5 has shown high clinimetric validity and the ability to accurately predict a wide range of mental health conditions, including depression; moreover, it has been recommended as an outcome measure balancing the wanted and unwanted effects of treatments [24]. That is why WHO-5 has been adopted in many research fields such as suicidology, geriatrics, youth and alcohol abuse studies, personality disorder research, and occupational psychology [15, 24].

Thus, WHO-5 and SWLS, being psychometrically sound screening tools with known outcomes, also measure complementary aspects of subjective well-being. Although measures of emotional affect and reported life satisfaction often correlate, substantial divergences have been found. For instance, almost half of the people who rated themselves as ‘completely satisfied’ also reported significant symptoms of anxiety and distress [17]. Therefore, quality of life in the current coronavirus crisis is usually measured with both scales [5, 6, 25–27]: while WHO-5 helps to assess influence of different practices on SWB and the persistence of diminished well-being beyond and during COVID-19, SWLS shows how people feel and how their life perspective changes due to the pandemic. This complementarity indicates the importance of comparative research in prediction of both metrics.

This task is novel for SWB prediction with digital traces: despite the advances in detection of specific mental health problems and the attempts to predict some SWB metrics, no research so far has been dedicated to predicting WHO-5 and its comparison with SWLS in terms of digital behavior traces; moreover, most research is limited to English-speaking populations. Best models predicting SWLS with digital traces from social media, search engine and smartphone activity data demonstrate performance below 0.4 in terms of Pearson correlation – a well-known threshold for correlation between psychological characteristics and objective behavior [28, 29] (see also [30, 31] for an overview). None of the models combines language, social media and smartphone usage data.

The goal of this study is to predict individual WHO-5 and SWLS levels with a new combination of digital traces in a high-risk Russian-speaking population, to find out which features are the most predictive and what the overall predictive power of our models is. A high-risk population is defined as a population with a higher probability of having problematic levels of SWB, as compared to more general populations. We thus address a completely novel task of comparative prediction of two different aspects of subjective well-being, which should have different objective indicators and suggest different actions to be taken by the user. Additionally, we find out that depression risk in Russian-speaking population can be detected by the level of WHO-5 below a certain threshold as successfully as in the populations for which WHO-5 was tested earlier, and this allows us to predict the threshold as well. To do so, we make use of a sample of 372 psychological application users who have explicitly consented to share their private messages, social media data and mobile device usage traces. We use extensive feature engineering combined with regression and classification modeling, the first type of models being aimed at SWB score prediction, and the second – and depression risk identification based on theoretically justified thresholds. We also check our regression models against newest neural network approaches that, however, do not show sufficient quality at the dataset of our size.

The rest of the paper is structured as follows. In the next section we review the existing literature in prediction of SWB and related psychological and mental health phenomena with digital traces. Next, we describe our dataset, our numerous features and the approach to their engineering, as well as the models used. In the Results section we report our best models’ performance and the most useful features. In the Discussion section we interpret our results and indicate the most important limitations. We conclude with the perspectives for future research.

1.1 Subjective well-being prediction

Prediction of internal psychological and mental states from objective behavior pattern is a highly difficult task [29, 32]. Additionally, clinically diagnosed mental disorders (such as depression) and mental disorder risks assessed through threshold scores of screening tests (such as WHO-5) are different categories for prediction. While the former may be partially manifest, the latter, along with psychological traits and conditions, are latent constructs. This means that psychological theory does not expect them to fully correlate with any observable patterns since the former are not thought of as reducible to the latter in principle. This may be one of the reasons why such correlation is seldom high, although this is a subject for further research. As both high SWB and the absence of mental disorder symptoms have been shown to be components of mental health [12, 33], prediction of both SWB and mental disorder (or its risk) constitutes two related tasks. However, due to the different nature of SWB and mental disorder as concepts, the former is usually evaluated with continuous predictive models, while the detection of the latter is most often formulated as a classification task.

1.1.1 Detection of mental disorders

A vast amount of studies predict specific mental health conditions with digital traces, mostly with the data from social media, such as Facebook and Twitter. The most widely analyzed conditions of such studies are depression and Post Traumatic Stress Disorder [34–38]. Other conditions include Bipolar Disorder, Anxiety and Social Anxiety Disorder, eating disorders, self-harm and suicide attempt [39–42]. Linguistic features used typically include word n-grams, sentiment, specific lexica (e.g., Linguistic Inquiry & Word Count dictionary, LIWC) and topic modelling, with other features related to social networks, emotions, cognitive styles, user activity and demographics [34–39, 42]. Model evaluation metrics include Area Under the Curve (AUC), Precision, Accuracy of classification, and Correlation for continuous measurements. The results for binary mental health problem identification are high, reaching an AUC of 0.7–0.89, Precision up to 0.85, and Accuracy of 0.69–0.72 [30].

Ground truth information in such studies is obtained from different sources, leading to different quality. Most studies use either self-reported survey data [34, 37] or self-declared mental illness [36, 39]. The latter is prone to errors and bias induced by specific data collection methods.

In a recent study Eichstaedt et al. [38] effectively predict depression of Facebook users against medical records information. The authors use a 6-month history of Facebook statuses posted by 683 hospital patients, of whom 114 were diagnosed with depression (rate similar to the general population), and classify depression VS other medical diagnoses with an AUC = 0.72. Features of Facebook statuses include words and word bigrams, temporal characteristics of posting activity, metainformation on post length and frequency, topics and dictionary categories, with interpersonal, emotional and cognitive categories being among the best predictors.

The effects of smartphone usage on mental disorders, until very recently, have been mostly studied with self-reported data (see [43, 44] for an overview). Meanwhile, smartphone apps that collect usage data provide an unprecedented opportunity to access objective and precise information on smartphone application usage. Hung et al. [45] find that phone call duration and rhythm patterns are predictive of negative emotions, while Saeb et al. [46] predict depressive symptom severity with geographical location and phone usage frequency information. However, as feature engineering with phone app usage data requires considerable time and effort [47], the potential of such data of psychological research is yet to be discovered.

1.1.2 Prediction of SWB levels

There have been a few studies aimed at predicting subjective well-being levels, mostly with regression, which obtain modest results. Individual and relational well-being was predicted from social network data [28, 48] and from objective smartphone use data [49]. The reported results are close to the upper bound expected in this task: the meta-analytic correlation between digital traces and psychological well-being has been estimated as \(r = 0.37\) across nine studies, including prediction of subjective well-being, emotional distress and depression [28]. The only study that reaches a higher correlation of 0.66 in one of the models [49] does not specify the scales used for measuring SWB; however, interestingly, it finds that while some apps predictably have a negative effect on well-being, others affect it positively.

Diener’s SWLS, to our knowledge, has been predicted in only four studies that use digital traces in a cross-validated setting. In his pioneering study, Kosinski et al. [50] predicted SWLS with linear regression for 2340 Facebook users based on 58K ‘Likes’ – preferences of webpages indicated by the users. The Likes data dimensionality was reduced to top 100 components in a SVD model based on a larger dataset (58K users). The obtained correlation reached \(r = 0.17\), whereas empirical test-retest correlation for SWLS was \(r = 0.44\).

Collins et al. [51] predicted SWLS with Random Forest Regression and various Facebook features, including demographics, networking data, photos, likes, ground truth Big Five traits of the users, of their significant others and friends, and predicted Big Five as a proxy. The best result for a sample of 1360 users with Big Five features as a proxy reached the Mean Absolute Error (MAE) = 0.162, whereas the model with social network features produced MAE = 0.173 for SWLS. Unfortunately, no other evaluation metrics were reported in this study. Schwartz et al. [52] applied Ridge Regression to predict SWLS of 2198 individuals using their Facebook statuses. Thousands of linguistic features were extracted from the status texts, including 2000 topics obtained with the Latent Dirichlet Allocation topic modeling algorithm, word uni- and bi-grams, LIWC and sentiment lexica. A message-user level cascaded aggregation model was additionally trained on a disjoint dataset, which allowed to improve regression results from Pearson \(r = 0.301\) to \(r = 0.333\). Facebook status data were also used by Chen et al. [53] to predict SWLS of 2612 users. Features included affect measured by sentiment word usage, 2K topics obtained with topic modeling and 66 LIWC categories. After feature selection with Elastic Net regression, Random Forest model was tested for prediction of an unseen subset. The results reach Root-Mean-Square Error RMSE = 1.30 (0.217 when rescaled to \([0;1]\)) and \(r = 0.36\).

There is a certain number of studies predicting SWB with app usage data. Some of them rely on self-reported measures of app use [54], while others collect objective data [49, 55]. Correlation in David’s model range from 0.31 to 0.66, however, the research does not specify the scales used for measuring SWB. At the same time, interestingly, it finds that while some apps predictably have a negative effect on well-being, others affect it positively. Gao and colleagues [55] report correlation from 0.34 for male users to 0.66 for female users in the task of predicting SWLS, however, they do not report the full feature set and the contribution of each feature in their best models. Instead, they mention that the most predictive variables are communication apps, certain types of games and the frequency of photo taking. None of these studies mentions cross-validation.

Overall, although the results of subjective well-being prediction are promising, several gaps in the existing research can be identified. First, WHO-5, which is an effective screening tool for depression risk and subjective well-being, has never been studied in a predictive research design. Second, all the studies predicting SWLS are limited to English-speaking populations and respective linguistic features. Moreover, these works only address Facebook digital traces, including profile, texts and likes. Finally, only scarce feature interpretation is reported in the previous studies, and digital trace manifestations of different well-being dimensions have never been compared.

1.2 Our approach

In this study, we set out to predict two different concepts of subjective well-being: one combining affective balance and life satisfaction (measured by SWLS index and further referred to as satisfaction-related SWB) and the other conceptualized as a reflection of mental health (measured by WHO-5 index and further referred to as mental SWB). For predicting well-being values, our task is defined as regression, while for detecting depression risk, we formulate our goal as a binary and trinary classification task. For the latter, we identify the threshold values of WHO-5 by validating them against the scores of the same users on the scales of depression, anxiety and stress, so that the WHO-5 values predicting these scores with the highest sensitivity and specificity are chosen. We perform our prediction of SWB on the texts of private messages, social media and smartphone usage information and perform regression and classification experiments in a cross-validated Machine Learning design. The novelty of the current study lies in the following:

1.
We present the first study so far on predicting subjective well-being measured by WHO-5;
2.
We find out a close association of WHO-5 thresholds with three scales of mental health which is promising in terms of extending our approach to the task of simultaneous prediction of a range of various mental health risks.
3.
We are the first to compare satisfaction-based and mental SWB, analyzing their intersections and differences in terms of predictive features;
4.
This is the first study to combine language, social media and phone app usage features in well-being research;
5.
To our knowledge, our study is the first to address subjective well-being prediction in a Russian-speaking population and respective data: the Russian social network VKontakte and texts in the Russian language;
6.
We use a dataset of a psychological application users, allowing us to predict subjective well-being in real-world conditions for a sample with high mental risks, which has never been done before;

2 Materials and methods

2.1 Dataset

Our dataset was collected in collaboration with Humanteq social analytics company, using its DigitalFreud app (DF) – a Russian-language phone application for psychological self-assessment – promoted among Android-based smartphone users through Google Ads. Android was chosen as the basic operational system for data collection, as at the time of the app development and promotion its users constituted the majority (68–76%) [56] of Russian smartphone users who in turn were the app’s target audience and who constituted 57–64% [57] of Russia’s population. Additionally, the app was available to Russian speakers from any country, and although users from the countries other than Russia constituted the minority, none of the samples we further analyze is intended to be representative of Russia.

Data collection via a psychological app of such type was used to access a high-risk population (its high-risk status was confirmed in subsequent comparison of its mean SWB to those in other populations, presented further below). Users were offered to take as many free tests as they wanted (including personality traits, depression, anxiety, stress, cognitive, motivation and SWB tests) and to explicitly consent to the access to their VKontakte profile data and/or smartphone use data. Based on the test results, users were offered psychological feedback and analytics on the use of VKontakte and/or their smartphones. On average, DigitalFreud users chose to fill in 1.5 questionnaires and shared varying subsets of their data, which made the overall dataset quite sparse.

Privacy policy included a clause stating that the data could be used for research. The study was approved by the HSE Ethics Committee; nevertheless, the data were anonymized prior to the analysis. No personal information (i.e. allowing to identify the users) was included in the sample. In particular, all the user profile ids were encrypted.

The initial sample included 2050 accounts of DigitalFreud users who have completed at least one of the two questionnaires of our interest: SWLS [10] or WHO-5 [58]. The vast majority completed either of the tests only once; for those who did it more than once, the earliest score was taken into our dataset.

The following digital traces data were available for the participants:

DigitalFreud profile data;
VKontakte user data;
Phone application data.

Due to data sparsity, our final sample used in prediction contains digital traces by 372 users. The procedure of data cleaning that produced this dataset is given in Appendix 1. Thus the dataset is small because the data on well-being combined with personal digital traces is highly difficult to obtain, as it requires both considerable effort from a user on completing the questionnaires, and trust allowing them to share sensitive digital traces. However, our dataset is uniquely tailored to the task of predicting SWB in a high-risk population of mental health app users.

Additionally, there is a heldout dataset, which consists of messages written by 572 users, who lack other important features for prediction (demographics, phone app usage) but have text data. The heldout dataset is used for preliminary feature selection (see sections Words, Word clusters below). Before feature selection, texts were tokenized with happiestfuntokenizing^{Footnote 1} and lemmatized it with pymorphy [59].

The phone app dataset consists of phone application usage data by 992 users who lack other important features for prediction. The phone app dataset was used for preliminary phone application categorization and feature engineering.

We also collected a sub-sample of users (\(N = 417\)), who have completed the WHO-5 and at least one of the following questionnaires evaluating different mental health risks (mental health dataset):

1.
Depression measured with the Patient Health Questionnaire (PHQ-9) [60];
2.
Anxiety measured with the General Anxiety Disorder scale (GAD) [61];
3.
Stress measured with the Perceived Stress Scale (PSS) [62, 63].

The mental health dataset was used in the WHO-5 classification task to select cutoff thresholds of the classes to be predicted, so the former would be representative of a range of mental health conditions.

2.1.1 Self-reported well-being measures

Satisfaction-related well-being scale (SWLS)

The SWLS questionnaire was translated to Russian and validated by Ledovaya et al. [64].

The questionnaire contains 5 statements, each characterized by 7-point Likert scale ranging from 1 (strongly agree) to 7 (strongly disagree). The resulting SWLS score ranges from 5 (low satisfaction) to 35 (high satisfaction). The scale has good internal consistency: α coefficients ranging from 0.79 to 0.89. Test-retest coefficient, as already mentioned, ranges from 0.54 to 0.84 depending on the time lag between measurements (years or weeks, respectively) [21] and amounts to 0.78 in the Russian language version[64]. In our sample, 1727 accounts have information about the SWLS score.

Mental well-being scale (WHO-5)

We use the official Russian-language version of WHO-5 scale developed by WHO itself [58]. Each of WHO-5 items is scored on a 6-point Likert scale ranging from 0 (at no time) to 5 (all of the time). The WHO-5 score ranges from 5 (absence of well-being) to 30 (maximal well-being).The scale has good Internal consistency: α coefficients ranging from 0.82 to 0.95 [13]. Test-retest coefficients are available for specific populations only and only in the short run ranging from 0.81 to 0.83 [65, 66]. In our sample, 1791 accounts have information about the WHO-5 score.

Mental well-being classes

As mentioned earlier, WHO-5, unlike SWLS, is indicative of a range of mental health conditions [24] and was directly designed to detect one of them [11]. Decisions of mental health, be it screening test results or medical diagnoses, are usually binary and point either at the absence or the presence of a disease. For such tasks scales need to be transformed into sets of discrete classes based on a certain threshold values. Such validated values exist for the original English-language WHO-5 scale (0.28 for major depression and 0.5 for depression). They are recommended for all nations and languages, but in fact have never been tested for the Russian-language population. Meanwhile, it has been shown that cultural differences matter in scale construction [67] and that, specifically, they complicate both mean WHO-5 comparison and threshold comparison across countries [15]. Therefore, we validated several thresholds ourselves. For this, we analyzed the mental health dataset of 417 DigitalFreud users who have completed both WHO-5 and one of the three questionnaires – on depression, anxiety and stress – and found the values of WHO-5 index best predictive of the classes of these three scales. This approach was our choice for two reasons:

the data on clinically diagnosed depression are absent from our dataset;
the three mentioned scales were validated for the Russian language and thus have been used here as the best available benchmarks.

We tried out different WHO-5 thresholds to reach better sensitivity and specificity in representing the following conditions: PHQ/GAD ≥ 10 for depression and anxiety [68], and PSS ≥ 21 for stress [63]. Additionally, as from our earlier work [69] we know that classes derived from scale reduction might be better predicted in a trinary design in social science NLP tasks, we also experimented with three-class divisions.

Eventually, our analysis resulted in the following cutoff values of the normalized WHO-5 scale:

Binary cutoff = 0.51 with classes containing 221 and 151 users in the low and high SWB classes, respectively;
Trinary cutoffs \(= [0.35; 0.59]\) with classes containing 111, 158 and 103 users in the low, medium and high SWB classes.

Table 1 illustrates sample statistics for each of the mental health conditions, and specificity and sensitivity in terms of the selected WHO-5 cutoff values.

Table 1 Specificity and sensitivity of the selected WHO-5 cutoff values in the mental health dataset

Predicting subjective well-being in a high-risk sample of Russian mental health app users

Abstract

1 Introduction

1.1 Subjective well-being prediction

1.1.1 Detection of mental disorders

1.1.2 Prediction of SWB levels

1.2 Our approach

2 Materials and methods

2.1 Dataset

2.1.1 Self-reported well-being measures

Satisfaction-related well-being scale (SWLS)

Mental well-being scale (WHO-5)

Mental well-being classes

2.1.2 Digital traces

DigitalFreud profile

VKontakte user information

Phone application usage

2.2 Descriptive statistics

2.3 Feature engineering

2.3.1 User metadata and overall activity features

2.3.2 Linguistic features

Sentiment scores

Words

RuLIWC

Word clusters

2.3.3 Phone app categories and usage features

2.4 Machine learning experiments

3 Results

3.1 Prediction of well-being scale values

3.2 Prediction of WHO-5 classes

3.3 Significant features

4 Discussion

5 Conclusions

Availability of data and materials

Notes

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Appendices

Appendix 1: Data preprocessing

Appendix 2: Distribution of the subjective well-being and demographic data in the final dataset

2.1 2.1 Demographic data description

Appendix 3: Word features

Appendix 4: Preliminary deep learning experiments

4.1 4.1 RuBERT

4.1.1 4.1.1 Sentiment analysis with RuSentiment BERT

Appendix 5: Models and hyperparameters used for SWLS and WHO-5 regression

Appendix 6: Models and hyperparameters used for WHO-5 classification

Appendix 7: SWLS regression results for all feature sets

Appendix 8: WHO-5 regression results for all feature sets

Appendix 9: WHO-5 classification results

Appendix 10: Features significant in SWLS regression

Appendix 11: Features significant in WHO-5 regression

Rights and permissions

About this article

Cite this article

Share this article

Keywords