Enriching feature engineering for short text samples by language time series analysis

EPJ Data Science

Table 6 Parametrization of word n-grams models. Optimal settings are typeset in bold

Process	Module	Parameters	Values
Preprocessing	Text preprocessing	Remove stopwords and stem words
Preprocessing	None	N/A
Represent	Word n-grams	N-gram range	Start = (1 to 3)–End = 3
		Minimum term frequency	1 (Use all terms)
		Maximum term frequency	1.0 (no limit)
Vectorize	TF-IDF vectorizer	TF	Normal, sublinear
		IDF	Normal, smoothed
		Normalization	L1, L2
	Count vectorizer	All set to default
Scaling	MaxAbsScaler	All set to default
Scaling	No scaler	N/A
Classifier	Logistic regression	All set to default
Classifier	Linear SVM	All set to default