Skip to main content

Table 2 Overview of the sentence-level methods available in the literature

From: SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

Name Output Validation Compared to Lexicon size
Emoticons , - - 79
Opinion Lexicon Provides polarities for lexicons Product Reviews from Amazon and CNet - 6,787
Opinion Finder (MPQA) , Objective, MPQA [45] Compared to itself in different versions 20,611
SentiWordNet Provides positive, negative and objective scores for each word (0.0 to 1.0) - General Inquirer (GI) [46] 117,658
Sentiment140 , 2, Their own datasets - 359 tweets (Tweets_STF, presented at Table 3) Naive Bayes, Maximum Entropy, and SVM classifiers as described in [6] -
LIWC15 , - Their previous dictionary (2001) 4,500
SenticNet , Patient Opinions (Unavailable) SentiStrength [11] 15,000
AFINN Provides polarity score for lexicons (−5 to 5) Twitter [47] OpinonFinder [22], ANEW [30], GI [46] and SentiStrength [11] 2,477
SO-CAL , 0, Epinion [48], MPQA [45], Myspace [11], MPQA [45], GI [46], SentiWordNet [24], ‘Maryland’ Dict [49], Google Generated Dict [50] 9,928
Emoticons DS (Distant Supervision) Provides polarity score for lexicons Validation with unlabeled Twitter data [51] - 1,162,894
NRC Hashtag Provides polarities for lexicons Twitter (SemEval-2007 Affective Text Corpus) [52] WordNet Affect [52] 679,468
Pattern.en Objective, , Product Reviews, but the source was not specified - 2,973
SASA [35] , Neutral, Unsure, ‘Political’ tweets labeled by ‘turkers’ (AMT) (unavailable) - -
PANAS-t Provides association for each word with eleven moods (joviality, attentiveness, fear, etc.) Validation with unlabeled Twitter data [51] - 50
Emolex Provides polarities for lexicons - Compared with existing gold standard data but it was not specified 141,820
USent , neu, Their own dataset - TED talks Comparison with other multimedia recommendation approaches MPQA (8,226)/Their own (9,176)
Sentiment140 Lexicon Provides polarity scores for lexicon Twitter and SMS from SemEval 2013, task 2 [53] Other SemEval 2013, task 2 approaches 1,220,176
SentiStrength , 0, Their own datasets - Twitter, Youtube, Digg, Myspace, BBC Forums and Runners World The best of nine Machine Learning techniques for each test 2,698
Stanford Recursive Deep Model , , neutral, , Movie Reviews [54] Naive Bayes and SVM with bag of words features and bag of bigram features 227,009
Umigon , Neutral, Twitter and SMS from SemEval 2013, task 2 [53] [40] 1,053
ANEW_WKB Provides ratings for words in terms of Valence, Arousal and Dominance. Results can also be grouped by gender, age and education - Compared to similar works, including cross-language studies, by means of correlations between emotional dimensions 13,915
VADER , (−0.05,…,0.05), Their own datasets - Twitter, Movie Reviews, Technical Product Reviews, NYT User’s Opinions GI [46], LIWC, [7], SentiWordNet [24], ANEW [30], SenticNet [55] and some Machine Learning approaches 7,517
LIWC15 , - Their previous dictionary (2007) 6,400
Semantria , neutral, Not available Not available Not available