Extroverts Tweet Differently from Introverts in Weibo

Being dominant factors driving the human actions, personalities can be excellent indicators in predicting the offline and online behavior of different individuals. However, because of the great expense and inevitable subjectivity in questionnaires and surveys, it is challenging for conventional studies to explore the connection between personality and behavior and gain insights in the context of large amount individuals. Considering the more and more important role of the online social media in daily communications, we argue that the footprint of massive individuals, like tweets in Weibo, can be the inspiring proxy to infer the personality and further understand its functions in shaping the online human behavior. In this study, a map from self-reports of personalities to online profiles of 293 active users in Weibo is established to train a competent machine learning model, which then successfully identifies over 7,000 users as extroverts or introverts. Systematical comparisons from perspectives of tempo-spatial patterns, online activities, emotion expressions and attitudes to virtual honor surprisingly disclose that the extrovert indeed behaves differently from the introvert in Weibo. Our findings provide solid evidence to justify the methodology of employing machine learning to objectively study personalities of massive individuals and shed lights on applications of probing personalities and corresponding behaviors solely through online profiles.


Introduction
The online social media has being becoming an essential component of everyday life, which even reflects all aspects of human behavior. Millions of users

Literature Review and Theoretical Background
Several well studied models have been established for personality traits and in which Big-Five model is the most popular one [23,24]. In this model, human personality can be depicted from five dimensions, including openness, neuroticism, extraversion, agreeableness and conscientiousness and the personality type could be identified through individual's behavior over the time and circumstances. The Internet, one of the most pervasive circumstance today, has in fact profoundly changed the human behavior and experience. With its explosive development, lots of research efforts have been devoted in investigating the relation between personality and Internet usage. For example, the findings of Amiel et al. demonstrate that distinctive patterns of Internet use and usage motives for those of different personality types and extroverts made more goaloriented use of Internet services [25]. Focusing on online social media, as the vital component of the Internet, extraversion and openness to experiences are found to be positively related with social media adoptions [26].
In the meantime, it was also pointed out that users' psychological traits could be inferred through their digital fingerprints in online social media [27,28]. Golbeck et al. proposed to bridge the gap between personality study and social media and demonstrated that social media (Facebook and Twitter) profiles can reflect personality traits [29,30]. They suggested that the number of parentheses used is negatively correlated with extraversion, however, explanations beyond the correlation is not provided and probing the correlations over a larger data set still remains necessary. Quercia et al. employed numbers of followees, followers and tweets to learn the personality and suggested that both popular users and influentials are extroverts with stable emotions [31]. Besides, patterns in language use of online social media, like words, phrases and topics also offer a way to reveal personalities [32]. For example, using dimensionality reduction for the Facebook Likes of participants, Kosinski et al. proposed a model to predict individual psycho-demographic profiles [19]. As for social media in China, Weibo and RenRen become the ideal platforms for conducting personal-ity research [33,34]. Considering the recent progress that computer algorithms outperform humans in personality judgment [20], online social media indeed offer unprecedented opportunities for personality inferring and human behavior understanding.
Each bipolar dimension (like extraversion) in Big-Five model summarizes several facets, which subsumes lots of more specific traits (extraversion vs. introversion). In this paper, we focus on the extraversion which is an indispensable dimension of personality traits. Many efforts from previous studies have been delivered to reveal the connection between extraversion and online behaviors and can be roughly reviewed from the following perspectives.
Social interactions Highly extroverted individuals tend to have broad social communications with others [33]. For instance, extraversion generally positively related to the number of Facebook friends [35,36]. Gosling et al. also found particularly strong consensus about Facebook profile-based personality assessment for extroverts [37]. However, Ross et al. [6] showed that extroverts are not necessarily associated with more Facebook friends, which are contrary to later results of Bachrach et al. [35] and Hamburger et al. [36]. Through posting tweets, extroverts are more actively sharing their lives and feelings with other people and the personality traits might shape the language styles in social media. In English, extroverts are more likely to mention social words such as 'party' and 'love you', whereas introverts are more likely to mention words related to solitary activities such as 'computer' and 'Internet' [32]. Referring to Chinese, extraversion is positively correlated with personal pronouns, indicating that extroverts tend to be more concerned about others [38].
Buying intention Extraversion, as one personality trait, is one of main factor in in driving online behaviors including buying, and hence exploring the relationship between extraversion and shopping is a valuable topic. DeSarbo and Edwards found that individuals of social isolation tend to perform compulsive buyings in efforts to relieve the feelings of loneliness due to a lack of interaction with others [39]. However, the results of subsequent studies about the relationship between compulsive buying and extraversion are inconsistent [40,41].
Emotion expression In psychology, it is widely believed that extraversion is associated with higher positive affect, namely extroverts experience increased positive emotions [42,43]. Extroverts are also more likely to utilize the supplementary entertainment services provided by social media, which bring them more happiness [44]. While, Qiu et al. suggested that highly extroverted participants do use it to relieve their existential anxiety in social media [45]. Thus, it is necessary to investigate the relation between various emotions and extraversion rather than only the positive affect.
However, most existing studies built their conclusions on self-reports from very small samples and the lacking of data or objectivity leads to inconsistent or even conflicting results. Moreover, a comprehensive understanding of how extroverts and introverts behave differently in the circumstance of online social media still remains unclear. Hence in this study, we try to employ machine learning models to identify and establish a large group of samples and then investigate the behavioral difference from diverse aspects, aiming at offering solid evidence and comprehensive views.

Dataset and participant population
The Big-Five model is the most accepted and commonly used model in depicting human personalities [23,24] and quite a few measuring instruments have also been developed to assess the Big-Five personality traits. In this study, a web page with a 60-question version of the Big-Five Personality Inventory [21] is built to collect self-reported scores on different personality traits. We target on Weibo users for voluntary participants recruitment and invitations were sent via both online and offline manners ranging from December 1, 2014 to March 31, 2015. All the participants are manually checked and only valid ones in Weibo (can be identified by the Weibo ID, a unique identification for each user) are considered. Finally a total of 293 valid participants are selected in the following study (144 men and 149 women) and the age of all participants ranges from 19 to 25. It is worth noting that according to the official report of Weibo in 2015, users with age between 17 to 33 occupy around 80% of its population, indicating that our refined samples of self-reports can sufficiently represent the most users in Weibo.
We focus on the extraversion of Big-Five personality traits in this study, which measures a personal tendency to seek stimulation in the external world, company of others, and express positive emotions [23]. People who score high in extraversion (called extroverts) are generally outgoing, energetic and friendly.
On the contrary, introverts are more likely to be solitary and seek environments characterized by lower levels of external simulation. The distribution of scores from 293 valid samples (Weibo users) on extraversion is shown in Fig. 1. The scores follow a typical Gaussian distribution with µ (mean value) being 39.03 and σ (standard deviation) being 7.55. It can be seen in Fig. 1 that the probability of scores near the mean value is relatively higher than the occurrence of both high scores and low scores, implying that a significant fraction of samples report the neutral scores on extraversion and they can be intuitively categorized to the type of without much significantly distinct personality, i.e., neither extroverts nor introverts. Because of this, it is reasonable to divide samples into three groups including extroverts (with high scores and labeled as 1), neutrals (with scores around the mean and labeled as 0) and introverts (with low scores and labeled as -1). Specifically, extroverts are samples with scores more than 42.81 (µ + σ/2), introverts are users with scores less than 35.25 (µ − σ/2) and neutrals represent users whose scores ranging from 35.25 to 42.81. The thresholds (µ ± σ/2) are set to balance the size of three categories, aiming at avoid the bias in machine learning models. By labeling 293 valid samples into three categories, we can obtain a training set for establishing and evaluating machine learning models that do not need the help of self-reports.  the training set, including 45 extroverts (1), 44 introverts (-1) and 56 neutrals (0). The training data is generally balanced on the three classification labels, especially for extroverts and introverts, which is helpful to avoid the bias of the machine learning model.

Extraversion classifier
As reviewed in the former section that many aspects of online profiles have been previously found to be connected with users' personalities, hence for the purpose of establishing a competent classifier to convincingly identify the three categories of extraversion without the help of self-reports, we try to extract as many features as we can from the digital and textual records and these features are roughly grouped into basic ones, interactive ones and linguistic ones. Details of different kinds of features are introduced as follows, respectively. Linguistic features Previous efforts on extraversion explorations have demonstrated that language styles in social media can be effectively indicators to infer personality traits. Because of this, we collect 261 terms that could describe the personality traits, including both Chinese and English, to linguistically model the tweets posted by users of different groups. After preprocessing the text, all tweets posted by a user is combined to form a document to represent the user's language style and all user's documents compose the corpus. Then the classic TF-IDF scores are employed to evaluate the 261 terms and the top 84 terms [46] are selected to extract linguistic features. Specifically, for any term within the 84 selected ones, if it occurs in a document (corresponding to a user) its feature value will be 1 otherwise 0. This method, called bag-of-word, is always utilized in natural language processing [47]. Meanwhile, we also consider the average length of tweets posted by the user.
It is worth noting that in our dataset of online profiles, there are significant differences in the scale of the extracted features. In order to train an unbiased machine models, feature standardization is indeed a necessary requirement. We perform the standardization and transform each feature into the range between zero and one. The transformation is given by where X i is the i-th item in the feature set X, X max is the maximal value of X, and X min is the minimal value of X.
In summary, we extract 130 features in total for each Weibo user, including 13 basic features, 32 interactive features and 85 linguistic features, which will be input of the machine learning models.

Models and Accuracy
Based on the training data and feature set obtained from the previous sections, three popular machine learning models, including Random Forest, Naive Bayes and Support Vector Machine (SVM) are employed to approach the 3categories classification problem for extraversion. And regarding to the optimization algorithm of SVM, we choose C-SVM (multi-class classification) as the solution and RBF as the kernel function. We adapt 10-fold cross-validation to examine the average accuracy of different models.
The baseline of accuracy for 3-category classification is 33.33%. As can be seen in Table 1, our 10-fold cross-validation results show that the Random Forest model cannot properly solve the classification of extraversion (with accuracy close to the baseline). The Naive Bayes and SVM models outperform the baseline solutions significantly, especially the SVM model, whose accuracy for both extroverts and introverts arrives around 50%. In the meantime, we also measure the average F1-score by calculating the rate of precision and recall for each label i and find that their unweighted mean that defined as is 0.4505, indicating that not only on the rate of precision but also on recall the performance of SVM can be further justified to be convincing. Therefore, we train a SVM model to be the extraversion classifier, which can be employed later to identify extroverts and introverts in Weibo without the help of self-reports.
Because of competent accuracy and F 1-score, we argue that machine learning models like SVM can break the limitations of conventional approaches like self-reports and greatly extend the scope for personalty explorations and offer an opportunity to comprehensively picture the behavioral differences between extroverts and introverts in social media.

Differences between extroverts and introverts
Employing the obtained SVM classifier, we attempt to identify extroverts and introverts from a large population of Weibo users, whose online public available profiles were collected through Weibo's open APIs within the period between November 2014 and March 2016 and the ones with less than 100 tweets   Table 2 that the proportion of extroverts tweeting on daytime (from 8:00 to 19:00) is 0.557 and that of introverts is 0.608; the proportion of extroverts tweeting at night (from 19:00 to 1:00 of the next day) is 0.358 while that for introverts is 0.305. Active postings at night for extroverts suggests that their nightlife might be more diverse than that of introverts.
Posting intervals between two temporally consecutive tweets of an individual Weibo user can be an excellent indicator to reflect its degree of preference and dependency on the social media. We calculate the average interval (with unit of hour) for extroverts and introverts from timestamps of their tweets, respectively.
As can be seen in Table 3, introverts post their tweets more frequently than extroverts, implying heavier dependency on the social media. Specifically, the mean interval of introverts is 19.09 hours while that for extroverts is 28.10 hours.
It is consistent and can be well explained by the previous finding that individuals who are in social isolation tend to depend on and indulge in the social media to relieve the loneliness due to lacking of interactions with others in the real life [39]. Meanwhile, differences in standard deviation also reveals that introverts have more regularity in Weibo than extroverts from the perspective of posting frequency (respectively 74.36 and 62.25 hours). Moreover, if only considering the time interval within one day (namely ignoring the interval over 24 hours), the mean interval of extroverts shrinks to 6.41 hours and that of introverts is 5.61 hours. In this case, the standard deviation of time interval of extroverts is 6.76 hours and which is more than that of introverts. It further justifies the finding that introverts post more frequently than extroverts and demonstrates more significant preference on the social media usage.

Spatial differences
An individual user can post geo-tagged tweets (or checkins) containing latitude and longitude of the location where the user is in Webo, which information indeed offers us a proxy to decently explore geographical differences between extroverts and introverts. To perform the geo-analysis, we extract 57,710 tweets with extract geographical locations, of which 38,729 tweets are posted by extroverts and 18,981 tweets by introverts.
For each geo-tagged tweet, we transform its longitude and latitude to the corresponding city (or county) through GeoPy project [48] and then for each user, we can accordingly obtain the list of cities or counties where it posted tweets. The results of the comparison between extroverts and introverts are shown in Fig. 3 and which surprisingly demonstrates the significant differences in spatial life style of users with two kinds of personality traits. As can be seen, 44.32% introverts post tweets only from one city or county, perhaps their residences, suggesting that nearly half of introverts prefer staying in just one familiar city or county. While as to extroverts, only 27.12% of them are located in only one city or county, which is far less than that of introverts. To be more specific, 14% of extroverts and 15% of introverts post tweets in two cities (or counties), 29% of extroverts and 19% of introverts post tweets in 3-5 cities or counties and the trend of extroverts tweeting at more places holds persistently as the city (or county) number ranges from 6 to 20. It is noteworthy that the number of extroverts posting tweets in 3-5 cities is even more than the number located in one city, which implies that extroverts prefer going to or visiting more cites than introverts, in which posting at one city or county dominates. As the number of cities (or counties) is more than 20, it is unexpected that the ratio within introverts is significantly greater than that of extroverts, implying that a tiny fraction of introverts might attempt to camouflage their own loneliness to others by updating tweets with a large number of different places [39].
Beyond the city granularity, in fact we can also perform geographical comparison on better resolutions, like Point-of-Interest (POI), which is a detailed description of the featuring function for small regions or point locations of cities.
Specifically, POIs exclude private facilities such as personal residences, but include many public facilities that seek to attract the general public such as retail businesses, amusement parks, industrial buildings and etc. Government buildings and significant natural features are POIs as well. They are also referred to hotels, restaurants, fuel stations or other categories in automotive navigation systems and recommendation systems [49]. In this study, nine kinds of POIs referring to Restaurants, Hotels, Life services, Shops, Enterprises, Transportation, Entertainment, Neighborhoods and Education are considered and the percentages of the most six POIs visited by extroverts and introverts are respectively shown in Figure 4. It is found that most geo-tagged tweets are posted from restaurants, occupying 66.38% for extroverts and 61.10% for introverts within the nine kinds of POIs we select. There are significant differences between extroverts and introverts on visiting Shops and Enterprises. The percentage of tweets located in shops is 4.58% for extroverts while 7.68% for introverts, implying that introverts prefer to checkin or post tweets while shopping as compared to extroverts. Even more interesting, the percentage of tweets located in companies and enterprises is 4.46% for extroverts and 2.59% for introverts. Since companies and enterprises are always the working places of individuals, it is suggested that extroverts tend to inform others as they are working.

Online activities
Diverse online activities, like sharing, interacting and buying in Weibo can be exactly sensed only through the tweets posted by users. Hence by ming texts of tweets posted by extroverts and introverts, we try to offer a behavioral difference landscape of online activities.
Sharing Each tweet in Weibo is labeled by a tag to manifest its posting source. For example, if a user logs in Weibo and posts one tweet, the source could be mobile devices (e.g. iPhone) or web browsers (e.g. Chrome). In the meantime, Weibo users always share news, videos, music to their friends or the public in social media and diverse sources of these shared information are also kept in tweets posted in terms of news websites, mobile applications or other social platforms which offer the sharing interface to Weibo. Besides, tweets shared from the selfie mobile softwares will also be tagged as selfies, which always contain the self-portrait photograph typically taken by the camera phone.
Because of these features, in this study we utilize the source label of each tweet to analyze the sharing behavior of extroverts and introverts. The occupations of the above four sharing in all tweets are demonstrated in Figure 5. As can be seen, the fraction of news sharing of introverts (0.6122%) is three more times than that of extroverts (0.1939%), contrarily, extroverts enjoy sharing more videos, music and selfies in social media than introverts, especially the selfies, e.g., the fraction of selfie tweets for extroverts is 0.3539% and is much higher introverts' 0.1276%. It is widely believed that selfie is connected with individual narcissism [50,51,52] and our findings further suggest that extraversion is positively coupled with selfie in social media.
Interacting Interacting patterns, especially the mentioning and retweeting are actually considered comprehensively in the feature set we extract in section difference analysis directly on the training set, i.e., users filled the self-reports.
In terms of Pearson correlation, we measure the linear dependence between the interaction features and the extraversion scores of participants. As can be seen in Table 4 Table 4 we conclude that the extroverts are more socially active and interacting than the introvert in social media, however, their interactions are more casual and temporally less regular than that of introverts.
Buying The most intrinsic nature of social media is updating users' every status to their friends and thus experience like buying or shopping can be  accordingly sensed through counting related keywords, namely the word-count method and which has been employed extensively in the field of psychology [53] in recent decades. In this study, 14  As can be seen, the Purchasing Indexes of more than 95% of users are smaller than 0.1 and the significant difference between extroverts and introverts also mainly locates in this region. Specifically, at the same Purchasing Index level (like > 0, > 0.04, > 0.06, > 0.08), the probability of introverts is always greater than extroverts, surprisingly suggesting that introverts prefer to publish tweets referring to purchasing than extroverts. This conclusion could apply to the advertising and sales of commodity and other realistic scenarios, i.e., introverts might be ideal marketing targets in online promotions. In addition, we perform the analysis of variance (ANOVA) to further testify the results, which method is used to analyze the differences among group means and their associated procedures. We adapt it to investigate whether there are significant differences of Purchasing Index between the two groups we discussed. As can be seen Table. 5, the p-value is less than 0.001 and the differences of two groups are statistically

Online emotion expressions
Tweets in social media not only deliver the factual information but also feelings of users and these feelings can be automatically identified into different emotions by mining only texts of tweets [22]. Because it is widely believed that extraversion is associated with higher positive affect, namely, extroverts may experience more positive emotions [42,43], thus in this study we try to investigate the differences between extroverts and introverts from the perspective of emotion expressions. By employing a previously built system named MoodLens [22], we can categorize each tweet into one of five emotions, including anger, disgust, happiness, sadness and fear. Note that the tweets without significant emotional propensity will be ignored. Then for each individual, either extrovert or introvert, we calculate its emotion index for all five sentiments, which is defined as the fraction the corresponding emotional tweets in its tweeting history and quantificationally represents its emotional disposition in social media.  in the meantime we also offer evidence that introversion is associated with the high arousal and negative affections like anger, fear and disgust and extraversion is positively correlated with sadness. Indeed with the help of data-driven approaches on large samples, our study can testify the existing conclusion and gain new insights at the same time.

Attitudes to virtual honor
Weibo grants many optional badges to users, which they have to "lighten" through finishing necessary operations following demand of the social media. For instance, the users should connect the Weibo account and the Taobao account if they would like to get the "Binding-Taobao" badge. This behavior, exposing the Taobao account to the social media, is a risk of property security and privacy.
However, the badges that users obtained are displayed publicly to the others and treated as honor in the virtual world. Because of this, a user's response to badges can be an indicator of its attitude to the virtual honor in social media.
Then we investigate the difference of attitudes to virtual honor of extroverts  and introverts, respectively. The "Binding-Taobao" badge is regarded as one relevant badge to perform the difference analysis and the distribution between extroverts and introverts of "Binding-Taobao" badges is shown in Fig. 9. The percentage of extroverts with the "Binding-Taobao" badge is 60.7% and that without the badge is 39.3%. The percentage of introverts with the "Binding-Taobao" badge is 53.9% and that without the badge is 46.1%. It's obvious that the proportion of extroverts who obtain "Binding-Taobao" badges is larger than that of introverts. Besides, we also examine other various badges in Weibo, including "Red envelope 2015", "Public welfare", "Travel 2013", "Red envelope 2014" and etc. All the statistics of the badge indicate that extroverts tend to prefer badges than introverts do, in other words, extroverts attach more importance to the online virtual honor.
To sum up, from perspectives of tempo-spatial patterns, online activities, emotion expressions and attitudes to virtual honor, we establish a comprehensive picture of how extroverts tweet differently from introverts in social media.

Conclusion
Personality traits, like extraversion, are believed to play fundamental roles in driving human actions, however, a detailed and comprehensive understanding of how people with different personality traits behave is still missing, especially in the circumstance of the social media, which has been becoming an indispens- This study has inevitable limitations. For example, according to the Big-Five model, the individual personality also possesses other traits like openness, conscientiousness and so on, which will be promising directions in our future work.