From: Enriching feature engineering for short text samples by language time series analysis
Process | Module | Parameters | Values |
---|---|---|---|
Represent | Character n-grams | N-gram range | Start = (1 to 5)–End = 5 |
Minimum term frequency | [0.05, 0.1, 0.5] | ||
Maximum term frequency | 1.0 (no limit) | ||
Vectorize | TF-IDF vectorizer | TF | Normal, sublinear |
IDF | Normal, smoothed | ||
Normalization | L1, L2 | ||
Count vectorizer | All set to default | ||
Scaling | MaxAbsScaler | All set to default | |
No scaler | N/A | ||
Classifier | Logistic regression | All set to default | |
Linear SVM | All set to default |