From: Enriching feature engineering for short text samples by language time series analysis
Process | Module | Parameters | Values |
---|---|---|---|
Preprocessing | Text preprocessing | Remove stopwords and stem words | |
None | N/A | ||
Represent | Word n-grams | N-gram range | Start = (1 to 3)–End = 3 |
Minimum term frequency | 1 (Use all terms) | ||
Maximum term frequency | 1.0 (no limit) | ||
Vectorize | TF-IDF vectorizer | TF | Normal, sublinear |
IDF | Normal, smoothed | ||
Normalization | L1, L2 | ||
Count vectorizer | All set to default | ||
Scaling | MaxAbsScaler | All set to default | |
No scaler | N/A | ||
Classifier | Logistic regression | All set to default | |
Linear SVM | All set to default |