Skip to main content

Table 4 Results of the Predictive Validity Analysis. r is the average Pearson’s correlation between predicted and observed scores. Δ% in the parentheses indicates the percentage change in r in comparison to the baseline r: 0.187. 95% CI refers to the 95% confidence interval around r

From: Evaluating the construct validity of text embeddings with application to survey questions

 

Lasso r (Δ%)

Lasso 95% CI

RF r (Δ%)

RF 95% CI

TF

0.106 (−43.316)

[0.102, 0.110]

0.337 (80.007)

[0.333,0.341]

TF-IDF

0.092 (−50.802)

[0.087, 0.096]

0.323 (72.830)

[0.319,0.327]

Random 300

0.149 (−20.321)

[0.144, 0.153]

0.331 (77.066)

[0.327,0.335]

Random 768

0.116 (−37.968)

[0.111, 0.120]

0.334 (78.614)

[0.330,0.338]

Random 1024

0.069 (−63.102)

[0.065, 0.073]

0.338 (80.520)

[0.333,0.342]

fastText

0.204 (9.261)

[0.200, 0.209]

0.356 (90.439)

[0.352,0.360]

GloVe

0.107 (−42.781)

[0.103, 0.111]

0.347 (85.664)

[0.343,0.351]

BERT-base-uncased

0.195 (4.278)

[0.191, 0.200]

0.411 (119.994)

[0.407,0.415]

BERT-large-uncased

0.151 (−19.251)

[0.147, 0.155]

0.378 (102.260)

[0.374,0.382]

All-DistilRoBERTa

0.188 (0.535)

[0.183, 0.192]

0.374 (100.228)

[0.370,0.378]

All-MPNet-base

0.119 (−36.364)

[0.115, 0.123]

0.406 (117.135)

[0.402,0.410]

USE

0.186 (−0.535)

[0.182, 0.191]

0.386 (106.272)

[0.382,0.390]