Evaluating the construct validity of text embeddings with application to survey questions

EPJ Data Science

Table 3 Results of Content Validity Analysis: Prediction Accuracy Scores of Probing Classifiers. Note that sentence length is converted into a categorical variable with four levels including “0-10”, “11-12”, “13-15” and “16-25”; basic concept, concrete concept and formulation are also categorical with 13, 117 and 5 levels, respectively

	Length	Basic concept	Concrete concept	Formulation	Average
Simple Majority	0.389	0.010	0.029	0.255	0.171
Random 300	0.102	0.198	0.440	0.742	0.371
Random 768	0.148	0.198	0.509	0.694	0.387
Random 1024	0.074	0.198	0.548	0.731	0.388
TF	0.148	0.198	0.636	0.770	0.438
TF-IDF	0.167	0.198	0.493	0.690	0.387
fastText	0.093	0.173	0.711	0.656	0.408
GloVe	0.194	0.192	0.908	0.642	0.484
BERT-base-uncased	0.657	0.175	0.815	0.944	0.648
BERT-large-uncased	0.620	0.153	0.739	0.908	0.605
All-DistilRoBERTa	0.407	0.198	0.916	0.776	0.574
All-MPNet-base	0.481	0.198	0.929	0.805	0.603
USE	0.454	0.198	0.903	0.853	0.602