Skip to main content

Table 6 Parametrization of word n-grams models. Optimal settings are typeset in bold

From: Enriching feature engineering for short text samples by language time series analysis

Process

Module

Parameters

Values

Preprocessing

Text preprocessing

Remove stopwords and stem words

None

N/A

Represent

Word n-grams

N-gram range

Start = (1 to 3)–End = 3

Minimum term frequency

1 (Use all terms)

Maximum term frequency

1.0 (no limit)

Vectorize

TF-IDF vectorizer

TF

Normal, sublinear

IDF

Normal, smoothed

Normalization

L1, L2

Count vectorizer

All set to default

Scaling

MaxAbsScaler

All set to default

No scaler

N/A

Classifier

Logistic regression

All set to default

Linear SVM

All set to default