From: Evaluating the construct validity of text embeddings with application to survey questions
 | Lasso RMSE (Δ%) | Lasso 95% CI | RF RMSE (Δ%) | RF 95% CI |
---|---|---|---|---|
TF | 0.296 (2.847) | [0.283, 0.310] | 0.279 (−3.059) | [0.263, 0.296] |
TF-IDF | 0.298 (3.525) | [0.283, 0.314] | 0.284 (−1.598) | [0.264, 0.303] |
Random 300 | 0.294 (2.207) | [0.282, 0.307] | 0.280 (−2.769) | [0.268, 0.292] |
Random 768 | 0.293 (1.658) | [0.280, 0.306] | 0.280 (−2.923) | [0.267, 0.292] |
Random 1024 | 0.297 (3.101) | [0.283, 0.311] | 0.279 (−3.170) | [0.266, 0.292] |
fastText | 0.290 (0.781) | [0.278, 0.303] | 0.277 (−3.772) | [0.266, 0.289] |
GloVe | 0.295 (2.425) | [0.283, 0.308] | 0.276 (−4.200) | [0.263, 0.289] |
BERT-base-uncased | 0.294 (1.962) | [0.279, 0.309] | 0.270 (−6.197) | [0.259, 0.282] |
BERT-large-uncased | 0.297 (2.986) | [0.282, 0.311] | 0.274 (−5.018) | [0.263, 0.285] |
All-DistilRoBERTa | 0.287 (−0.238) | [0.275, 0.300] | 0.272 (−5.434) | [0.260, 0.284] |
All-MPNet-base | 0.294 (1.947) | [0.280, 0.308] | 0.270 (−6.327) | [0.258, 0.282] |
USE | 0.290 (0.662) | [0.277, 0.303] | 0.273 (−5.405) | [0.261, 0.284] |