Skip to main content

Table 5 Results of the Predictive Validity Analysis. MAE is the mean absolute error between predicted and observed scores. Δ% in the parentheses indicates the percentage change in MAE in comparison to the baseline MAE: 0.240. 95% CI refers to the 95% confidence interval around MAE

From: Evaluating the construct validity of text embeddings with application to survey questions

 

Lasso 0.240 (Δ%)

Lasso 95% CI

RF 0.240 (Δ%)

RF 95% CI

TF

0.247 (2.970)

[0.233, 0.261]

0.226 (−5.958)

[0.211, 0.24]

TF-IDF

0.248 (3.478)

[0.233, 0.264]

0.228 (−5.061)

[0.211, 0.245]

Random 300

0.245 (2.247)

[0.233, 0.258]

0.231 (−3.568)

[0.219, 0.244]

Random 768

0.245 (1.899)

[0.232, 0.258]

0.231 (−3.842)

[0.218, 0.243]

Random 1024

0.247 (3.082)

[0.234, 0.261]

0.230 (−4.206)

[0.218, 0.242]

fastText

0.240 (−0.056)

[0.229, 0.251]

0.227 (−5.299)

[0.216, 0.239]

GloVe

0.245 (2.239)

[0.234, 0.257]

0.227 (−5.273)

[0.215, 0.240]

BERT-base-uncased

0.243 (1.391)

[0.230, 0.257]

0.222 (−7.527)

[0.211, 0.233]

BERT-large-uncased

0.245 (2.067)

[0.232, 0.258]

0.225 (−6.051)

[0.215, 0.236]

All-DistilRoBERTa

0.240 (−0.102)

[0.228, 0.252]

0.224 (−6.601)

[0.213, 0.235]

All-MPNet-base

0.245 (1.975)

[0.232, 0.258]

0.223 (−7.088)

[0.211, 0.235]

USE

0.241 (0.309)

[0.228, 0.254]

0.224 (−6.478)

[0.213, 0.236]