Skip to main content

Table 3 Labeled datasets

From: SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

Dataset Nomenclature # Msgs # Pos # Neg # Neu Average # of phrases Average # of words Annotators expertise # of annotators CK
Comments (BBC) [11] Comments_BBC 1,000 99 653 248 3.98 64.39 Non expert 3 0.427
Comments (Digg) [11] Comments_Digg 1,077 210 572 295 2.50 33.97 Non expert 3 0.607
Comments (NYT) [15] Comments_NYT 5,190 2,204 2,742 244 1.01 17.76 AMT 20 0.628
Comments (TED) [65] Comments_TED 839 318 409 112 1 16.95 Non expert 6 0.617
Comments (Youtube) [11] Comments_YTB 3,407 1,665 767 975 1.78 17.68 Non expert 3 0.724
Movie Reviews [54] Reviews_I 10,662 5,331 5,331 - 1.15 18.99 User rating - 0.719
Movie Reviews [15] Reviews_II 10,605 5,242 5,326 37 1.12 19.33 AMT 20 0.555
Myspace posts [11] Myspace 1,041 702 132 207 2.22 21.12 Non expert 3 0.647
Product Reviews [15] Amazon 3,708 2,128 1,482 98 1.03 16.59 AMT 20 0.822
Tweets (debate) [66] Tweets_DBT 3,238 730 1,249 1,259 1.86 14.86 AMT+expert Undef. 0.419
Tweets (random) [11] Tweets_RND_I 4,242 1,340 949 1,953 1.77 15.81 Non expert 3 0.683
Tweets (random) [15] Tweets_RND_II 4,200 2,897 1,299 4 1.87 14.10 AMT 20 0.800
Tweets (random) [67] Tweets_RND_III 3,771 739 488 2,536 1.54 14.32 AMT 3 0.824
Tweets (random) [68] Tweets_RND_IV 500 139 119 222 1.90 15.44 Expert Undef. 0.643
Tweets (specific domains w/emot.) [27] Tweets_STF 359 182 177 - 1.0 15.1 Non expert Undef. 1.000
Tweets (specific topics) [69] Tweets_SAN 3,737 580 654 2,503 1.60 15.03 Expert 1 0.404
Tweets (SemEval2013 task 2) [53] Tweets_Semeval 6,087 2,223 837 3,027 1.86 20.05 AMT 5 0.617
Runners World forum [11] RW 1,046 484 221 341 4.79 66.12 Non expert 3 0.615