From: Improving official statistics in emerging markets using machine learning and mobile phone data

Accuracy in EU (top) and SA (bottom) as a function of the size of the training sample. We reach an accuracy beyond 74% with training sets of 10k people. In case of data scarcity, we can further reduce the training size to 5k with minimal deterioration in performance. SVM-RBF reaches a higher accuracy than other algorithms in both cases for training sets larger than 5k people. In EU, increasing the training set size from 15k to 500k only increases the best accuracy by 1%.

