Skip to main content

Advertisement

Table 6 Comparison of our framework with the most related work [ 44 46 ]

From: Improving official statistics in emerging markets using machine learning and mobile phone data

  Martinez Herrera-Yagüe Herrera-Yagüe (Homophily) Sarraute Sarraute (Homophily) EU SA
Gender 0.56 0.55 0.65 0.663 0.743 0.745
(100%) (100%) (100%) (100%) (100%) (100%)
0.80 0.814 0.80 0.80
(3%) (12.5%) (76%) (65%)
Age 0.24 0.51 0.37 0.434 0.568
(100%) (100%) (100%) (100%) (100%)
0.527 0.623 0.819
(12.5%) (12.5%) (12.5%)
  1. The number outside (inside) the parentheses indicates the accuracy (the data coverage of such accuracy). Columns with homophily show the results when the labels of adjacent nodes, in addition to individual node-level attributes, were used as another feature in the algorithm. Martinez did not discuss age-prediction and Herrera-Yagüe did not report accuracy at different coverage level. Furthermore, Herrera-Yagüe predicted age into 6 categories, thus it should be compared against a random 0.16 baseline. Sarraute and our model predicted age into 4 categories, thus they should be compared against a 0.25 random baseline. The best models reported in [45, 46] leverage the homophily structure in the network, while our models do not exploit any information based on the homophily; as such a more justified comparison would be based on only the node-level attributes.