From: SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods
Name | Output | Validation | Compared to | Lexicon size |
---|---|---|---|---|
Emoticons | , | - | - | 79 |
Opinion Lexicon | Provides polarities for lexicons | Product Reviews from Amazon and CNet | - | 6,787 |
Opinion Finder (MPQA) | , Objective, | MPQA [45] | Compared to itself in different versions | 20,611 |
SentiWordNet | Provides positive, negative and objective scores for each word (0.0 to 1.0) | - | General Inquirer (GI) [46] | 117,658 |
Sentiment140 | , 2, | Their own datasets - 359 tweets (Tweets_STF, presented at Table 3) | Naive Bayes, Maximum Entropy, and SVM classifiers as described in [6] | - |
LIWC15 | , | - | Their previous dictionary (2001) | 4,500 |
SenticNet | , | Patient Opinions (Unavailable) | SentiStrength [11] | 15,000 |
AFINN | Provides polarity score for lexicons (−5 to 5) | Twitter [47] | OpinonFinder [22], ANEW [30], GI [46] and SentiStrength [11] | 2,477 |
SO-CAL | , 0, | MPQA [45], GI [46], SentiWordNet [24], ‘Maryland’ Dict [49], Google Generated Dict [50] | 9,928 | |
Emoticons DS (Distant Supervision) | Provides polarity score for lexicons | Validation with unlabeled Twitter data [51] | - | 1,162,894 |
NRC Hashtag | Provides polarities for lexicons | Twitter (SemEval-2007 Affective Text Corpus) [52] | WordNet Affect [52] | 679,468 |
Pattern.en | Objective, , | Product Reviews, but the source was not specified | - | 2,973 |
SASA [35] | , Neutral, Unsure, | ‘Political’ tweets labeled by ‘turkers’ (AMT) (unavailable) | - | - |
PANAS-t | Provides association for each word with eleven moods (joviality, attentiveness, fear, etc.) | Validation with unlabeled Twitter data [51] | - | 50 |
Emolex | Provides polarities for lexicons | - | Compared with existing gold standard data but it was not specified | 141,820 |
USent | , neu, | Their own dataset - TED talks | Comparison with other multimedia recommendation approaches | MPQA (8,226)/Their own (9,176) |
Sentiment140 Lexicon | Provides polarity scores for lexicon | Twitter and SMS from SemEval 2013, task 2 [53] | Other SemEval 2013, task 2 approaches | 1,220,176 |
SentiStrength | , 0, | Their own datasets - Twitter, Youtube, Digg, Myspace, BBC Forums and Runners World | The best of nine Machine Learning techniques for each test | 2,698 |
Stanford Recursive Deep Model | , , neutral, , | Movie Reviews [54] | Naive Bayes and SVM with bag of words features and bag of bigram features | 227,009 |
Umigon | , Neutral, | Twitter and SMS from SemEval 2013, task 2 [53] | [40] | 1,053 |
ANEW_WKB | Provides ratings for words in terms of Valence, Arousal and Dominance. Results can also be grouped by gender, age and education | - | Compared to similar works, including cross-language studies, by means of correlations between emotional dimensions | 13,915 |
VADER | , (−0.05,…,0.05), | Their own datasets - Twitter, Movie Reviews, Technical Product Reviews, NYT User’s Opinions | GI [46], LIWC, [7], SentiWordNet [24], ANEW [30], SenticNet [55] and some Machine Learning approaches | 7,517 |
LIWC15 | , | - | Their previous dictionary (2007) | 6,400 |
Semantria | , neutral, | Not available | Not available | Not available |