Skip to main content

Table 1 Overview of representative authorship attribution problems. References [15, 2839] discuss vector space models [9]

From: Enriching feature engineering for short text samples by language time series analysis

Paper Language Text style Average sample text length (words) Number of samples Number of classes
[15, 16, 40] English Federalist Papers 900 to 3500 85 3
[33] English Newspaper articles 89*** 112 50
714**** 14 50
[28] English Incriminating digital documents 290 69 10
[29] Modern Greek Digital messages 1209 250 10
Newspaper articles 1007.5 400 20
[30] Modern Greek Greek Parliament 1590 341 5
Register 2871 127 5
1285 1005 5
[31] German Newspaper articles 438 1200 2*
480 550 2*
357 3233 2*
[32] English Digital messages 169 300 to 400 10
Chinese 807** 300 to 400 10
[34] English Book chapters N/A 1960 to 2450 15
[35] English Variate types from the ad-hoc authorship attribution contest Hundreds to thousands 7 to 38 3 to 13
[36] English Works of Shakespeare and Fletcher 1000 100 2
[37] Belgian Newspaper articles 600 300 3
[38] Modern Greek Newspaper articles 866.8 200 10
1148.2 200 10
[39] English Novels written by Bronte Sisters 1000 480 2
500 942 2
200 2232 2
[41] English Twitter, blog, review, novel, and essay 127 to 7078 192 to 400 2*****
[42] English Works by Shakespeare, Christopher Marlowe, and Elizabeth Cary N/A 57 3
[43] Persian Books N/A 36 5
[44] English Books N/A 80 8
80 8
80 8
[45] English Books N/A 100 20
[46] English Books 20,000 100 10
  1. *The target author and the other authors
  2. **Chinese characters
  3. ***Sections of 500 characters
  4. ****Sections of 4000 characters
  5. *****Either True of False on an authorship verification problem