Skip to main content

Table 1 Overview of representative authorship attribution problems. References [15, 28–39] discuss vector space models [9]

From: Enriching feature engineering for short text samples by language time series analysis

Paper

Language

Text style

Average sample text length (words)

Number of samples

Number of classes

[15, 16, 40]

English

Federalist Papers

900 to 3500

85

3

[33]

English

Newspaper articles

89***

112

50

714****

14

50

[28]

English

Incriminating digital documents

290

69

10

[29]

Modern Greek

Digital messages

1209

250

10

Newspaper articles

1007.5

400

20

[30]

Modern Greek

Greek Parliament

1590

341

5

Register

2871

127

5

1285

1005

5

[31]

German

Newspaper articles

438

1200

2*

480

550

2*

357

3233

2*

[32]

English

Digital messages

169

300 to 400

10

Chinese

807**

300 to 400

10

[34]

English

Book chapters

N/A

1960 to 2450

15

[35]

English

Variate types from the ad-hoc authorship attribution contest

Hundreds to thousands

7 to 38

3 to 13

[36]

English

Works of Shakespeare and Fletcher

1000

100

2

[37]

Belgian

Newspaper articles

600

300

3

[38]

Modern Greek

Newspaper articles

866.8

200

10

1148.2

200

10

[39]

English

Novels written by Bronte Sisters

1000

480

2

500

942

2

200

2232

2

[41]

English

Twitter, blog, review, novel, and essay

127 to 7078

192 to 400

2*****

[42]

English

Works by Shakespeare, Christopher Marlowe, and Elizabeth Cary

N/A

57

3

[43]

Persian

Books

N/A

36

5

[44]

English

Books

N/A

80

8

80

8

80

8

[45]

English

Books

N/A

100

20

[46]

English

Books

20,000

100

10

  1. *The target author and the other authors
  2. **Chinese characters
  3. ***Sections of 500 characters
  4. ****Sections of 4000 characters
  5. *****Either True of False on an authorship verification problem