Skip to main content

Table 5 Parametrization of character n-grams models. Optimal settings are typeset in bold

From: Enriching feature engineering for short text samples by language time series analysis

Process Module Parameters Values
Represent Character n-grams N-gram range Start = (1 to 5)–End = 5
Minimum term frequency [0.05, 0.1, 0.5]
Maximum term frequency 1.0 (no limit)
Vectorize TF-IDF vectorizer TF Normal, sublinear
IDF Normal, smoothed
Normalization L1, L2
Count vectorizer All set to default
Scaling MaxAbsScaler All set to default
No scaler N/A
Classifier Logistic regression All set to default
Linear SVM All set to default