Skip to main content

Table 6 Parametrization of word n-grams models. Optimal settings are typeset in bold

From: Enriching feature engineering for short text samples by language time series analysis

Process Module Parameters Values
Preprocessing Text preprocessing Remove stopwords and stem words
None N/A
Represent Word n-grams N-gram range Start = (1 to 3)–End = 3
Minimum term frequency 1 (Use all terms)
Maximum term frequency 1.0 (no limit)
Vectorize TF-IDF vectorizer TF Normal, sublinear
IDF Normal, smoothed
Normalization L1, L2
Count vectorizer All set to default
Scaling MaxAbsScaler All set to default
No scaler N/A
Classifier Logistic regression All set to default
Linear SVM All set to default