Skip to main content

Table 3 Labeled datasets

From: SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

Dataset

Nomenclature

# Msgs

# Pos

# Neg

# Neu

Average # of phrases

Average # of words

Annotators expertise

# of annotators

CK

Comments (BBC) [11]

Comments_BBC

1,000

99

653

248

3.98

64.39

Non expert

3

0.427

Comments (Digg) [11]

Comments_Digg

1,077

210

572

295

2.50

33.97

Non expert

3

0.607

Comments (NYT) [15]

Comments_NYT

5,190

2,204

2,742

244

1.01

17.76

AMT

20

0.628

Comments (TED) [65]

Comments_TED

839

318

409

112

1

16.95

Non expert

6

0.617

Comments (Youtube) [11]

Comments_YTB

3,407

1,665

767

975

1.78

17.68

Non expert

3

0.724

Movie Reviews [54]

Reviews_I

10,662

5,331

5,331

-

1.15

18.99

User rating

-

0.719

Movie Reviews [15]

Reviews_II

10,605

5,242

5,326

37

1.12

19.33

AMT

20

0.555

Myspace posts [11]

Myspace

1,041

702

132

207

2.22

21.12

Non expert

3

0.647

Product Reviews [15]

Amazon

3,708

2,128

1,482

98

1.03

16.59

AMT

20

0.822

Tweets (debate) [66]

Tweets_DBT

3,238

730

1,249

1,259

1.86

14.86

AMT+expert

Undef.

0.419

Tweets (random) [11]

Tweets_RND_I

4,242

1,340

949

1,953

1.77

15.81

Non expert

3

0.683

Tweets (random) [15]

Tweets_RND_II

4,200

2,897

1,299

4

1.87

14.10

AMT

20

0.800

Tweets (random) [67]

Tweets_RND_III

3,771

739

488

2,536

1.54

14.32

AMT

3

0.824

Tweets (random) [68]

Tweets_RND_IV

500

139

119

222

1.90

15.44

Expert

Undef.

0.643

Tweets (specific domains w/emot.) [27]

Tweets_STF

359

182

177

-

1.0

15.1

Non expert

Undef.

1.000

Tweets (specific topics) [69]

Tweets_SAN

3,737

580

654

2,503

1.60

15.03

Expert

1

0.404

Tweets (SemEval2013 task 2) [53]

Tweets_Semeval

6,087

2,223

837

3,027

1.86

20.05

AMT

5

0.617

Runners World forum [11]

RW

1,046

484

221

341

4.79

66.12

Non expert

3

0.615