Skip to main content

Table 5 Parametrization of character n-grams models. Optimal settings are typeset in bold

From: Enriching feature engineering for short text samples by language time series analysis

Process

Module

Parameters

Values

Represent

Character n-grams

N-gram range

Start = (1 to 5)–End = 5

Minimum term frequency

[0.05, 0.1, 0.5]

Maximum term frequency

1.0 (no limit)

Vectorize

TF-IDF vectorizer

TF

Normal, sublinear

IDF

Normal, smoothed

Normalization

L1, L2

Count vectorizer

All set to default

Scaling

MaxAbsScaler

All set to default

No scaler

N/A

Classifier

Logistic regression

All set to default

Linear SVM

All set to default