A profile-based sentiment-aware approach for depression detection in social media

Depression is a severe mental health problem. Due to its relevance, the development of computational tools for its detection has attracted increasing attention in recent years. In this context, several research works have addressed the problem using word-based approaches (e.g., a bag of words). This type of representation has shown to be useful, indicating that words act as linguistic markers of depression. However, we believe that in addition to words, their contexts contain implicitly valuable information that could be inferred and exploited to enhance the detection of signs of depression. Specifically, we explore the use of user’s characteristics and the expressed sentiments in the messages as context insights. The main idea is that the words’ discriminative value depends on the characteristics of the person who is writing and on the polarity of the messages where they occur. Hence, this paper introduces a new approach based on specializing the framework of classification to profiles of users (e.g., males or women) and considering the sentiments expressed in the messages through a new text representation that captures their polarity (e.g., positive or negative). The proposed approach was evaluated on benchmark datasets from social media; the results achieved are encouraging, since they outperform those of state-of-the-art corresponding to computationally more expensive methods.


Introduction
Depression is a common mental health problem that severely impacts our society. More than 264 million people of all ages suffer from depression worldwide 1 affecting seriously their quality of life and physical health. Unfortunately, in severe cases, depression can even result in suicide [1]. The relevance of this health problem has motivated the development of computational tools for the automatic detection and monitoring of people suffering from this mental disorder [2]. Several studies have shown that people who suffer from this disorder alter their written language and communicate differently in both style and content [3]; for example, high self-focused attention and high usage of negative words can reveal signs of depression [4]. The link between language and the psychological state of people has led to the exploration of data from social networks for the automatic detection of depression, aiming to take advantage of the large amount of information generated by people through these media, in which they tend to express their thematic interests, experiences, desires, concerns, etc. Accordingly, recent evaluation forums have focused on detecting people suffering from depression through the analysis of their social media posts [5][6][7].
Traditionally, the automatic detection of depression has been tackled as a supervised text classification problem. Recent works have addressed it using neural networks fed by word embeddings [8,9], but also have considered a diversity of text representations such as histograms of word categories (e.g., LIWC 2 ), broad and fine-grained emotions [10,11], latent topics extracted with LDA [12,13], and traditional word-based representations [14]. The latter have shown to be useful and competitive for this task, indicating that words act as linguistic markers of depression. Although their good results, we argue these representations can be enriched by additional context information, which, although usually not indicated explicitly, can be inferred from the posts. In particular, the proposed approach exploits demographic characteristics of the users (inferred from the texts) and the polarity of the messages on the basis of two key ideas: (i) people belonging to the same group (e.g., males or females) tend to manifest and express their depression similarly but differently to other groups and, (ii) words are important markers of depression, but their correct interpretation depends on the polarity, positive or negative, of their usage context. Some previous works have pointed out differences related to the prevalence of mental disorders regarding users' traits, such as gender [15,16], age [15], and occupation [17]. For example, depression is more common in women, especially around puberty [18]. Furthermore, it is expected that different groups of people manifest their depression differently, because each group has its own interests and way of communicating; take for instance the case of women and men, or young and mature persons. Supported on this idea, we propose an approach that specializes classifiers according to the users traits. On the other hand, recent studies indicate that people's psychological well-being is associated with the type of relationship perceived between positive and negative affect, as independent or bipolar opposites [19,20]. Inspired by these ideas, we consider that to distinguish depression, it is essential to analyze the contexts of use of words, both positive and negative, in a dual way. For example, depressed and non-depressed users can talk about the same topic (e.g., about their partners), but the polarity of their contexts can be very different, negative (e.g., a separation) or positive (e.g., a nice experience), respectively. Hence, we propose using a dual text representation which allows encapsulating both polarities to provide the classifiers with the possibility of relating them and finding discriminative patterns.
In general, the proposed approach jointly uses the words, the polarities of their posts, and the profile traits of the users to distinguish those suffering from depression. In other words, it is aiming to discover word-sentiment patterns associated with different groups of people suffering from depression, such as the negative usage of the word "calories" in the case of depressed women, or the high occurrence of sexuality issues, for example, the word "virgin", in negative posts for the case of young depressed users. Summarizing, the main contributions of this work are threefold: (i) a new approach to the detection of depression in social media, which applies specialized classifiers for different groups of users and takes into account the polarities of their posts; (ii) a new dual bag-of-words representation, which captures and distinguishes the occurrences of the words in both positive and negative contexts; and (iii) an in-deep analysis of the role of profiling and polarity information in the depression detection task.
The rest of this paper is organized as follows. Section 2 provides a review of the related work on depression detection on social media. Section 3 introduces the proposed approach. Section 4 explains the proposed text representation. Section 5 presents the experimental settings. Section 6 shows the experiments and results. Section 7 exposes an analysis and discussion of the obtained results. Finally, our conclusions and future work directions are drawn in the last section.

Related work 2.1 Depression detection on social media
Social networks are increasingly used to share daily activities; moreover, they are also used to connect with others as a form of social support on health issues [21]. Under this scenario, computational approaches have leveraged the information from social media to study the depression as well as to detect users suffering from depression. For example, some studies use representations based on BoW [22][23][24] to make the detection at user and post levels. These kinds of representations allow to easily measure and compare the utility of word n-grams to identify depression in posts. Representations based on topics have also been explored. For example, [25] analyzed online health forums to identify changes in the language and topics to which depressive users are mostly associated. Some other works based on topics have explored the use of resources and techniques such as LIWC and LDA [26][27][28]. In this regard, the combination of manually and automatically generated topics has shown good performance [13]. Recently, due to the relevance of the problem, some evaluation forums such as eRisk 3 have motivated the development of computational approaches to face the early detection of social media users suffering from depression [5,6]. Different architectures have been evaluated through the editions of this forum, from new methods based on word representations [29] to complex architectures using deep learning architectures [30], reaching results around F1 = 0.65 over the positive class, which suggest that distinguishing depressed from non-depressed social media users remains as a major challenge.
As can be observed, various approaches and methods have been proposed to predict the depression disorder from a computational perspective on social media. The idea behind all of them is to help in clinical care. In this regard, recently, Chancellor and De Choudhury [31] studied some issues of construct validity that could inhibit reproducibility and extension into practical and clinical domains, for example, issues of reporting practices. Accordingly, the authors provided some interesting recommendations to addresses these challenges.

The role of profiling traits for depression detection
From a psychological perspective, the role of demographic factors in mental illness has been studied [32,33]. Particularly, several works have analyzed the relationship between patients' profile attributes (e.g., age, gender, and personality traits) and the manifestation of their depression [34]. Most of them have found clear differences among distinct groups of people, particularly between men and women [35][36][37]. For example, differences in depression by gender have been explored by considering social roles, norms of culture, family environment, and biological factors [3]. Some works have also explored differences in the relationship between social media use and depressive symptoms in the child and adolescent population, finding a significant correlation between usage patterns and depressive symptoms in young people [38].
From a computational perspective, some works approaching the automatic detection of depression in social media have used the age and gender of users as classification features. For example, in [33] the detection of users suffering from depression was carried out considering only profile attributes. The results reported using age and gender attributes were 15% better than random guessing, suggesting that they provide relevant information for the task at hand. In [22] these two demographic features were used in conjunction with word n-grams, part of speech (POS) tags, emoticons, sentiment polarities, and LIWC categories. Although the results obtained were not conclusive regarding their relevance to the task, because they were used in combination with many other features, the authors suggested they contribute for the good performance of their model. Similar to these two approaches, we also take advantage of the profile information of users, but not as extra features, instead we aim to infer profiles for specializing the classification process according to them.

Emotions in depression detection
Emotional information has been mainly studied by means of discrete emotions (e.g., sadness and anger), but also by its polarity in a dichotomous scale of positive and negative values, characterizing statements into positive or negative expressions mutually exclusive [39,40]. From a psychological perspective, emotions are aspects which help to diagnose depression [41]; in particular negative affect has been associated with the depression disorder [42,43]. Recently, the severity of depressive symptoms was associated with a more inverse relationship between positive and negative affect (i.e., high bipolarity), mainly because individuals with depressive symptoms present difficulty on regulating emotions, leading to a reduction of emotional complexity [19,20]. In this regard, but in a slightly different direction, [44] studied the relationship between distorted thinking and depression, concluding that individuals with depression tend to exhibit higher levels of cognitive distortions, such as the dichotomous, catastrophic and disqualifying reasoning of the positive.
From a computational perspective, some studies have examined the use of representations based on emotions. For example, [10] considered the frequency of occurrence of the main emotions in the users' posts to identify those suffering from depression, showing their advantage over the exclusive use of linguistic characteristics. Then, [11] went a step further by using a bag of sub-emotions (BoSE) representation, in which posts are represented by a frequency histogram of fine-grained instead of broad emotions, allowing to capture more specific topics expressed by depressed users. These two recent contributions show that the emotional tone of the information is relevant for the detection of depression. Following this idea, but different from these works, here we explore in a dual way the positive and negative valences of the sentiments for establishing the discriminative value of words for revealing depression traits.

Specializing classifiers according to users' profiles
As previously stated, the task of depression detection in social media has been addressed as a supervised, binary, text classification problem, which goal is to learn a classifier that categorizes the users, described by their post histories, into one of two possible categories, a user suffering from depression or a non-depressed user.
Inspired by the idea that people belonging to the same group manifest and express their depression similarly, we propose to build specialized classifiers for the different groups of users, defining these groups according to some of their traits. For example, to build independent classifiers for male and female users, or for young and senior users. The proposed approach is depicted in Fig. 1. In it, each user is described by a single document, 4 which contains all her/his posts, and each classifier c i is specially trained to predict depression exclusively on the group of users corresponding to profile u i . Accordingly, each unlabeled document (i.e., new user) will be evaluated only by the classifier specialized in its respective profile.
It is worth mentioning that the proposed approach is general, and it does not depend on the particular traits used to separate the users, nor on how these are determined, which could be manually or automatically. It can also be used in combination with any document representation. For the experiments, we consider gender and age traits automatically inferred from the texts, and a novel dual word-sentiment representation. For details refer to Sects. 5.5 and 4, respectively.

Bag of polarities: a dual word-sentiment representation for depression detection
Term-based representations, such as the bag of words (BoW), are commonly used in text classification tasks, showing satisfactory results in most of them. However, for complex tasks, which require finer discrimination between classes, their main drawback is that Figure 2 The bag of polarities representation; in it each document from the collection is represented by a dual vector that captures the occurrence of words in both positive and negative contexts they do not capture information of the contexts of the words. To address this limitation, different types of contextualized word embeddings (e.g., ELMo, Flair, and BERT) have been recently proposed [45,46]. Their idea is to add syntactic and semantic information to the words' representations aiming to dynamically capture their meaning. They have achieved outstanding results in several text processing tasks, but on the contrary, they have reduced the interpretability of the results. Being aware of the relevance of the interpretability of results and explicability of methods in mental health applications, we decided to continue working with term-based representations, but extending them with some kind of context information. As previously mentioned, we consider that the value of the words as linguistic markers of depression largely depends on the polarity of their contexts of use. For example, when users mention words related to their family or work, it is essential to know their contexts' polarity (positive or negative) before considering them as relevant signs for establishing a user as suffering from depression. Accordingly, we propose a BoW-based dual representation that allows capturing both positive and negative uses of all words, thus providing the classifiers with the possibility of exploring the relationship between both kinds of mentions. Figure 2 illustrates this representation, which we named as Bag of Polarities (BoP). As shown, its construction starts from the identification and separation of positive and negative posts, indicated in blue and red respectively. Then, considering the vocabulary of the posts of all users, a BoW-type representation is built, which maintains information about the occurrences in positive and negative contexts of each word, making the size of BoP double that of a traditional BoW. That is, each word is mapped to two different components of the representation space, see for example the case of the word "mother", which occurs in the blue as well as red sections of the representation, accounting for its positive and negative occurrences, respectively.
More formally, the BoP representation can be defined as follows: Let D = {d 1 , . . . , d |D| } denote the set of documents (i.e., social media users in our case), and V = {w 1 , . . . , w |V | } its vocabulary; each document d i is represented as vector a d i = d + i di , which results from the concatenation of its vectors of positive and negative contexts defined as: where the v i,j values indicate the proportion of the occurrences of the word w j in the positive and negative posts of document d i . That is, if P i , P + i , and Pi represent the set of posts in d i and its respective subsets of positive and negative posts, where f (w, P) indicates the frequency of occurrence of a word w in the set of posts P.
As explained above, the BoP representation considers as extra information the polarity of the posts; similar to the approach presented in the previous section, it does not depend on how this information is determined, it could be inferred from the texts using any existing method for that purpose [47][48][49]. In our experiments, we used the procedure detailed in Sect. 5.4.

Datasets
For evaluating the proposed approach, we used two benchmark datasets in English: a collection from Reddit users released in the context of the eRisk 2018 task [5] (hereafter denoted as Reddit), and a collection of Twitter users described in [50] (hereafter denoted as Twitter). Both collections were gathered by the respective authors using their own set of APIs. Table 1 summarizes their main statistics. The range of dates from the first submission of users to the last submission is around 500 and 30 days, for Reddit and Twitter, respectively. Both collections include users labeled as depressed and control (or nondepressed), and also their construction followed a similar approach: according to their authors, users were labeled as depressed if they explicitly mentioned (self-declaration) that were "diagnosed with depression", whereas non-depressed users correspond to users that never used the word "depressed" in their posts.

Text representations
All texts from the two collections were tokenized into unigrams, lower-cased, and the stop words and special characters were removed. To model the users' content, we considered the following two text representations: • Bag of Words (BoW): a standard BoW using unigrams and tf-idf weights. This representation acts as the baseline method in the experiments. • Bag of Polarities (BoP): the dual word-sentiment representation introduced in Sect. 4. Both representations were built using the words from the training partition with the highest χ 2 values; 6000 and 10,000 words for Reddit and Twitter, respectively. 5

Classification and evaluation
During training we considered different classifiers, such as SVM, Random Forest, and a Bagging of Decision Trees; we decided using the latter as the base classifier in all the experiments because it showed the highest results. The hyperparameters of the model were automatically tuned using a 5-fold cross-validation over the training set and the sklearn's GridSearch algorithm. The search was focused on: (i) the number of trees (10,20,30,40,50) and (ii) their depth (3,5,6,9,10). Finding ideal values: 20 and 6 respectively. For evaluation purposes, in the case of Reddit, we ran the bagging algorithm five times and reported the average outcome on the test partition, whereas, in the case of Twitter, we applied a 5-fold cross-validation. In both cases, we report the F1 score over the positive class (i.e., depressed users), as it is the evaluation measure used in most previous works considering these two datasets.

Assignment of posts' polarities
The process to determine the posts' polarities is based on SentiWordNet [51], which is a resource that has three scores between -1 and 1 specified for each WordNet Synset, indicating how positive, negative, and objective are their words. In particular, for a given post, we estimate its polarity by averaging the scores corresponding to the synsets of each of its words. If the average is negative, then we assign the post a negative polarity, while if it is positive, we assign it a positive polarity. Table 2 shows the distributions of positive and negative posts in both collections, which in average correspond to 69% positive and 31% negative.
It is important to make three points: first, words missing in SentiWordNet were ignored; second, for words belonging to multiple synsets, we took the score of the synset corresponding to their most frequent meaning and that matches its part-of-speech label (to maintain the text connotation); third, posts having an average score equal to zero (i.e., neutral polarity) were not taken into account.

Inference of users' traits
As described in Sect. 3, we propose to specialize the classifiers according to different users' traits, particularly, gender and age, which allow us to differentiate between male and female users as well as between young and senior users. Given that the used collections do not contain information about the users' traits, we needed to apply a process to automatically infer them from their texts. We used two lexicons especially designed for this purpose [52], which include words weighted by their orientation; in the first lexicon positive and negative scores are associated with female and male respectively, whereas in the second lexicon positive scores indicate words commonly used by senior users, while negative weights are words more associated with young users.
The process to infer the users' traits is as follows: for a given document (i.e., a user), and using the two lexicons independently, we calculate the weighted sum of the scores of the words by their relative frequency in the document. The weighted sum directly indicates the user age; we labeled all users under the age of 25 as young, and the rest as senior. 6 On the other hand, the sign of the result indicates the gender of the user, positive for females and negative for males. Table 2 shows the distributions of both traits in the two collections.  They indicate that the number of male and female users, as well as young and senior, are similar in the Twitter collection, however, they strongly differ in Reddit, where men and senior users are more abundant than women and young users, respectively.

Overall performance evaluation
The main goal of the experiments carried out was to evaluate the suitability of the proposed approach for the depression detection task, considering data from two different social media. Indirectly, our goal was to determine the relevance of using specialized classifiers for the different kinds of users as well as of taking into account the polarity of the context of occurrence of the words. Table 3 shows the results of the different configurations we considered. It includes results of classifiers that were specialized according to the age and gender of the users, referred as age-based classifiers and gender-based classifiers, respectively. It also shows the results for the standard approach that uses one single classifier for all users (named as single classifier). For each of these configurations, it presents results with the traditional BoW representation, as well as with the proposed BoP representation, which provides sentiment awareness. From the results of Table 3, the following can be highlighted: First, the proposed approach, which jointly uses the words, the polarities of their posts, and the profile traits of the users to distinguish those suffering from depression, clearly outperformed the baseline approach that corresponds to the use of one single classifier for all users with a BoW representation. This is especially evident in the Reddit collection, where the best result of the proposed approach shows a difference around 8% with respect to the baseline; for Twitter the difference was around 4%. Second, neither of the two components of the proposed approach (namely, specialized classifiers per users' traits and the BoP representation), is by itself good enough to obtain significantly better results than the baseline; they have to be used in combination. This observation is supported by the small differences between the results of the single classifier using BoW and BoP, as well as by the differences between the trait-based classifiers and the single classifier when using BoW as representation. In general, these results corroborate our initial intuition that the words' discriminative value for the detection of depression depends on both, the characteristics of the person who is writing, and the polarity of the messages where they occur.
Third, the age-based classifier with the BoP representation shows a good performance (better than the baseline); however, the best results were achieved when considering the gender-based classifier, which reached outstanding results in both collections, 0.71 and 0.89 for Reddit and Twitter, respectively. Despite this difference, we believe our results are not entirely conclusive regarding the advantage of using classifiers by gender instead of by age, since they depend on the method used to infer these attributes and, on the other hand, age-based classifiers could be improved when considering finer age groups.
To investigate whether the differences of the results between the approaches were statistically significant, we performed a statistical analysis. According to the t-test, and using a p = 0.05, the results from the trait-based classifiers with the BoP representation are significantly better than the results from the baseline approach, and also than the results from these classifiers but with the BoW representation. To deepen the comparison, we applied the Non-Parametric Bayesian Hierarchical test [53] over the results of the gender-based classifiers using BoP and BoW. Figure 3 shows the results of this analysis; they indicate a greater probability that the gender-based classifiers obtained better results using BoP than BoW. Table 4 compares the best results of the proposed approach against the best results previously reported for the two used collections. These results correspond to the following works: [30], which achieved the best performance in the eRisk 2018 shared task, and employed user-level linguistic metadata, a bag of words representation, neural word embed- Table 4 Comparison of the performance of our gender-based classifiers with the BoP representation against state-of-the-art results. The results marked with an "*" correspond to approaches using both text and image features Dataset Approach F1 (positive class)

Macro-F1
Twitter Gender-based classifiers using BOP 0.87 (± 0.01) MDL [50] 0.85* DFC+FC [56] 78.5 (± 1.2)* GRU+VGG-NET+COMMA [57] 0.90* BiGRU-CNN [58] 0 . 8 5 dings from Glove, and a convolutional neural network; [54], in which the task was modeled as a one-class classification problem in order to deal with the uncertainty regarding negative instances; [50] and [56], in which text and image features were used, as well as the posting behavior of users; [58], which used word embeddings and a combination of recurrent and convolutional neural networks; [57], which proposed a multimodal approach, using text and images, and applying a complex deep neural network architecture; [11], which proposed to model the users by frequency histograms of fine-grained emotions; and [55], which extended the previous work by including the learning of the fine-grained emotions into an end-to-end architecture. In general, the results from Table 4 indicate that the proposed approach, despite its simplicity (it is based on a word-based representation), performs better than most state-ofthe-art methods based on complex deep learning models. For example, in the Reddit collection, it had a gain of 7% with respect to the best previous result, whereas, in the Twitter corpus, it performed better than the works using only textual features, but showed a slightly smaller F1 result (3% less) than a multimodal approach using both text and image information.

Assessing the robustness of the proposed approach to different classifiers
The previous sections showed results of the proposed approach when it was used in conjunction with a Bagging of Decision Trees as classifier, highlighting its relevance for the depression detection task. The goal of this section is to study the robustness of the approach when other machine learning algorithms are used to build the classifier. More precisely, we aim to assess the ability of the approach to achieve a similar performance when used in conjunction with other classifiers.
For this experiment, we consider two classifiers commonly used in text classification tasks, namely, a Support Vector Machine (SVM) and a Random Forest (RF). Table 5 shows the results obtained with both classifiers in the Reddit and Twitter collections. Analyzing the results under the approximation of a "single classifier", we can observe that BoP is better than BoW, regardless of the classifier used, therefore confirming the relevance of taking into account the polarities of the posts for the depression detection task. On the other hand, the trait-based classifiers achieved in all cases better results than the single classifiers, reaffirming the adequacy of specializing the classification process according to the different kinds of users. In general, these experiments show that the performance of the  proposed approach is robust to the selection of the classifier, and that it mostly depends on the modeling of the words, the polarities of their posts, and the profile traits of the users. Nonetheless, as shown in Table 3, the best results were obtained when using the Bagging of Decision Trees as classifier.

Users' traits, posts' polarities, and the discriminative value of words
The previous experiments confirmed that words alone are not enough to fully distinguish between social media users who suffer from depression and those who do not. Furthermore, they highlighted the relevance of considering some characteristics of the person who is writing, and the polarity of the messages where they occur, to improve the detection of those suffering from depression. In order to understand the influence of these two aspects on the discriminative value of words, we analyzed the information gain (IG) of the features from the BoP representation, for our two trait-based classifiers. Interestingly, we observed that some words present an important difference in their IG values when used in contexts of different polarity. Figure 4 exemplifies some of these words, indicating the relative importance of their IG values in positive (blue bars) and negative (red bars) contexts, and for the gender and age-based classifiers. For example, for discriminating between male depressed and non-depressed users in Reddit, terms related to relationships (e.g., friendship and dating) were highly relevant when they occurred in positive contexts, whereas words such as family and sexual were more informative when occurring in negative contexts. In the case of female users, words such as attractive, sex, and family were more important when occurring in positive than in negative contexts, and words such as boyfriend were good discriminators when they occurred in negative contexts. It was interesting to notice that some words were relevant for both gender-based classifiers, but with a presence in distinct contexts, that was the case of family, which was highly discriminative in negative posts for males and in positive posts for females. Table 6 shows some example posts with this word. Also in Reddit, words like dating and boyfriend in positive posts, and virgin, weight and relationship in negative posts, were relevant to distinguish between depressed and non-depressed young users. Interestingly, the word relationships was also relevant for the case of senior users, but when it occurred in positive posts. Table 6 also contrasts some example posts from young and senior users containing this word.
In the case of Twitter things were not very different. Posts from depressed and nondepressed users were distinguished by interests and concerns usually associated to their gender and age group. For example, we found a high relevance of the word calories when used in negative contexts to identify users, mainly young women, who suffer from depres-  sion. Similarly, the presence of the word drunk in negative contexts showed high relevance to detect senior male users who suffer from depression.

Error analysis: on the effect of the users' history length
As previously mentioned, the detection of depression is commonly handled as a text classification problem, and, therefore, the length of the users' post histories can have some impact on the performance. We carried out an error analysis in the detection of depressed users according to this variable. Figure 5 presents the results from this analysis, considering four length ranges, 7 and indicating the percentages of correct and incorrect predictions. From Fig. 5, it can be observed that in both collections, the higher the number of posts by the users, the lower the prediction errors. For example, the error percentage is only around 17% and 10% for the largest post histories, while it is around 37.5% and 23% for the smallest, in Reddit and Twitter respectively. These results clearly indicate that the more evidence is available on a user, the greater the confidence on the decision of whether or not he/she suffers from depression.

Error analysis: on the effect of the temporal expansion of users' histories
We also analyzed the error percentages with respect to the temporal expansion of the users' post histories, defining it as the number of days between the first and last post from a user. For this analysis, we considered four and three intervals of time 8 for the Reddit and Twitter collections, respectively. Figure 6 shows the results of this analysis. In the Reddit collection, the error percentage increased as the temporal expansion increased. In particular, it exceeded 20% when the users were observed for very long periods (more than two years). This result may be due to the fact that during a long period users could receive treatment or even come out of their depression. On the other hand, in the case of Twitter, the error percentage decreased when users were observed for more time. Although these results may seem contradictory, they are actually complementary; it is important to highlight the difference in the size of the observation intervals, because in the case of the Twitter collection, no user shows a temporal expansion greater than 30 days. Hence, as an integrated conclusion, we can say that it is expected to have a reduction in the detection errors when more user information is analyzed; however, if users are observed for long periods, there is a greater chance for depressive traits to fade or blur as a result of various factors, such as, for example, a successful treatment.

Conclusions and future work
In this work we proposed a novel profile-based sentiment-aware approach for the detection of depression in social media. The main idea behind this approach is to leverage implicit information of the words' contexts to enhance the detection of signs of depression, under the assumption that the words' discriminative value for this task depends on the characteristics of the user who is writing and on the polarity of the messages where they occur.
Through the paper, we presented an experimental study that evaluates the proposed approach in two social media collections, one from Reddit and another from Twitter, which considerably differ in the number of users, the length of their posts, and the temporal expansion of their histories. In general, the results achieved were encouraging, since they outperformed those of state-of-the-art corresponding to computationally more expensive methods, indicating the suitability of the approach for the task at hand. In particular, we obtained the following conclusions: (i) considering specific classifiers based on the users' gender is more useful than based on their age, which could indicate that the differences in the expression of depressed people are more noticeable among men and women, than among young people and adults. However, as we previously pointed out, this observation could be changed when considering finer age groups. (ii) the discriminative value of words as depression markers greatly depends on their contexts' polarity, positive or negative, but at the same time, the adequate interpretation of it varies according to the type of user, therefore, trait-based classifiers and the dual word-sentiment representation need to be used in combination. (iii) factors such as the length and temporal expansion of the users' post histories had an important influence on the approach performance, particularly, we observed better results when more user information was analyzed; however, when users were observed for very long periods the performance tended to decrease.
As future work, we plan to combine the proposed approach with different text representations, such as those based on word embeddings or LIWC. We also consider to explore other author profiling methods to infer the gender and age of users, as well as to use finer age groups. Likewise, we have in mind to apply attention models under a trait-based approach to automatically learn relevant word-sequences depression patterns. Finally, the good results achieved also motive us to carry out a similar study in other related tasks, such as, for example, in the detection of aggressive comments in social media, this under the premise that different types of users tend to show different types of aggressiveness, and that the posts' polarity is a key issue on the correct interpretation of bad words and other idiomatic expressions.