SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

Ribeiro, Filipe N; Araújo, Matheus; Gonçalves, Pollyanna; André Gonçalves, Marcos; Benevenuto, Fabrício

doi:10.1140/epjds/s13688-016-0085-1

EPJ Data Science

Table 3 Labeled datasets

From: SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

Dataset	Nomenclature	# Msgs	# Pos	# Neg	# Neu	Average # of phrases	Average # of words	Annotators expertise	# of annotators	CK
Comments (BBC) [11]	Comments_BBC	1,000	99	653	248	3.98	64.39	Non expert	3	0.427
Comments (Digg) [11]	Comments_Digg	1,077	210	572	295	2.50	33.97	Non expert	3	0.607
Comments (NYT) [15]	Comments_NYT	5,190	2,204	2,742	244	1.01	17.76	AMT	20	0.628
Comments (TED) [65]	Comments_TED	839	318	409	112	1	16.95	Non expert	6	0.617
Comments (Youtube) [11]	Comments_YTB	3,407	1,665	767	975	1.78	17.68	Non expert	3	0.724
Movie Reviews [54]	Reviews_I	10,662	5,331	5,331	-	1.15	18.99	User rating	-	0.719
Movie Reviews [15]	Reviews_II	10,605	5,242	5,326	37	1.12	19.33	AMT	20	0.555
Myspace posts [11]	Myspace	1,041	702	132	207	2.22	21.12	Non expert	3	0.647
Product Reviews [15]	Amazon	3,708	2,128	1,482	98	1.03	16.59	AMT	20	0.822
Tweets (debate) [66]	Tweets_DBT	3,238	730	1,249	1,259	1.86	14.86	AMT+expert	Undef.	0.419
Tweets (random) [11]	Tweets_RND_I	4,242	1,340	949	1,953	1.77	15.81	Non expert	3	0.683
Tweets (random) [15]	Tweets_RND_II	4,200	2,897	1,299	4	1.87	14.10	AMT	20	0.800
Tweets (random) [67]	Tweets_RND_III	3,771	739	488	2,536	1.54	14.32	AMT	3	0.824
Tweets (random) [68]	Tweets_RND_IV	500	139	119	222	1.90	15.44	Expert	Undef.	0.643
Tweets (specific domains w/emot.) [27]	Tweets_STF	359	182	177	-	1.0	15.1	Non expert	Undef.	1.000
Tweets (specific topics) [69]	Tweets_SAN	3,737	580	654	2,503	1.60	15.03	Expert	1	0.404
Tweets (SemEval2013 task 2) [53]	Tweets_Semeval	6,087	2,223	837	3,027	1.86	20.05	AMT	5	0.617
Runners World forum [11]	RW	1,046	484	221	341	4.79	66.12	Non expert	3	0.615

Back to article page