Skip to main content

Table 6 Results of the Predictive Validity Analysis. RMSE is the root-mean-square error between predicted and observed scores. Δ% in the parentheses indicates the percentage change in RMSE in comparison to the baseline RMSE: 0.288. 95% CI refers to the 95% confidence interval around RMSE

From: Evaluating the construct validity of text embeddings with application to survey questions

 

Lasso RMSE (Δ%)

Lasso 95% CI

RF RMSE (Δ%)

RF 95% CI

TF

0.296 (2.847)

[0.283, 0.310]

0.279 (−3.059)

[0.263, 0.296]

TF-IDF

0.298 (3.525)

[0.283, 0.314]

0.284 (−1.598)

[0.264, 0.303]

Random 300

0.294 (2.207)

[0.282, 0.307]

0.280 (−2.769)

[0.268, 0.292]

Random 768

0.293 (1.658)

[0.280, 0.306]

0.280 (−2.923)

[0.267, 0.292]

Random 1024

0.297 (3.101)

[0.283, 0.311]

0.279 (−3.170)

[0.266, 0.292]

fastText

0.290 (0.781)

[0.278, 0.303]

0.277 (−3.772)

[0.266, 0.289]

GloVe

0.295 (2.425)

[0.283, 0.308]

0.276 (−4.200)

[0.263, 0.289]

BERT-base-uncased

0.294 (1.962)

[0.279, 0.309]

0.270 (−6.197)

[0.259, 0.282]

BERT-large-uncased

0.297 (2.986)

[0.282, 0.311]

0.274 (−5.018)

[0.263, 0.285]

All-DistilRoBERTa

0.287 (−0.238)

[0.275, 0.300]

0.272 (−5.434)

[0.260, 0.284]

All-MPNet-base

0.294 (1.947)

[0.280, 0.308]

0.270 (−6.327)

[0.258, 0.282]

USE

0.290 (0.662)

[0.277, 0.303]

0.273 (−5.405)

[0.261, 0.284]