Skip to main content

Table 6 Comparison of our framework with the most related work [ 44 46 ]

From: Improving official statistics in emerging markets using machine learning and mobile phone data

 

Martinez

Herrera-Yagüe

Herrera-Yagüe (Homophily)

Sarraute

Sarraute (Homophily)

EU

SA

Gender

0.56

0.55

0.65

0.663

0.743

0.745

(100%)

(100%)

(100%)

(100%)

(100%)

(100%)

0.80

0.814

0.80

0.80

(3%)

(12.5%)

(76%)

(65%)

Age

0.24

0.51

0.37

0.434

0.568

(100%)

(100%)

(100%)

(100%)

(100%)

0.527

0.623

0.819

(12.5%)

(12.5%)

(12.5%)

  1. The number outside (inside) the parentheses indicates the accuracy (the data coverage of such accuracy). Columns with homophily show the results when the labels of adjacent nodes, in addition to individual node-level attributes, were used as another feature in the algorithm. Martinez did not discuss age-prediction and Herrera-Yagüe did not report accuracy at different coverage level. Furthermore, Herrera-Yagüe predicted age into 6 categories, thus it should be compared against a random 0.16 baseline. Sarraute and our model predicted age into 4 categories, thus they should be compared against a 0.25 random baseline. The best models reported in [45, 46] leverage the homophily structure in the network, while our models do not exploit any information based on the homophily; as such a more justified comparison would be based on only the node-level attributes.