From: Enriching feature engineering for short text samples by language time series analysis
Paper | Language | Text style | Average sample text length (words) | Number of samples | Number of classes |
---|---|---|---|---|---|
English | Federalist Papers | 900 to 3500 | 85 | 3 | |
[33] | English | Newspaper articles | 89*** | 112 | 50 |
714**** | 14 | 50 | |||
[28] | English | Incriminating digital documents | 290 | 69 | 10 |
[29] | Modern Greek | Digital messages | 1209 | 250 | 10 |
Newspaper articles | 1007.5 | 400 | 20 | ||
[30] | Modern Greek | Greek Parliament | 1590 | 341 | 5 |
Register | 2871 | 127 | 5 | ||
1285 | 1005 | 5 | |||
[31] | German | Newspaper articles | 438 | 1200 | 2* |
480 | 550 | 2* | |||
357 | 3233 | 2* | |||
[32] | English | Digital messages | 169 | 300 to 400 | 10 |
Chinese | 807** | 300 to 400 | 10 | ||
[34] | English | Book chapters | N/A | 1960 to 2450 | 15 |
[35] | English | Variate types from the ad-hoc authorship attribution contest | Hundreds to thousands | 7 to 38 | 3 to 13 |
[36] | English | Works of Shakespeare and Fletcher | 1000 | 100 | 2 |
[37] | Belgian | Newspaper articles | 600 | 300 | 3 |
[38] | Modern Greek | Newspaper articles | 866.8 | 200 | 10 |
1148.2 | 200 | 10 | |||
[39] | English | Novels written by Bronte Sisters | 1000 | 480 | 2 |
500 | 942 | 2 | |||
200 | 2232 | 2 | |||
[41] | English | Twitter, blog, review, novel, and essay | 127 to 7078 | 192 to 400 | 2***** |
[42] | English | Works by Shakespeare, Christopher Marlowe, and Elizabeth Cary | N/A | 57 | 3 |
[43] | Persian | Books | N/A | 36 | 5 |
[44] | English | Books | N/A | 80 | 8 |
80 | 8 | ||||
80 | 8 | ||||
[45] | English | Books | N/A | 100 | 20 |
[46] | English | Books | 20,000 | 100 | 10 |