Skip to main content

Advertisement

Table 5 The most (top 3) and least (bottom 3) predictive factors of gender in the South Asian data

From: Improving official statistics in emerging markets using machine learning and mobile phone data

Top indicators in the factor Proportion of variance Weighted F1 score
call_duration__weekday__day__call__min__std 0.4% 0.68
call_duration__weekday__day__call__median__std
call_duration__weekday__day__call__min__mean
call_duration__weekend__night__call__min__mean 0.5% 0.68
call_duration__weekend__night__call__min__std
call_duration__weekend__night__call__median__std
percent_initiated_interactions__weekday__day__call__mean 0.3% 0.66
percent_initiated_interactions__allweek__day__call__mean
percent_initiated_interactions__weekend__day__call__mean
number_of_interaction_in__allweek__allday__text__std 0.6% 0.37
number_of_interaction_in__allweek__day__text__std
number_of_interaction_in__weekday__allday__text__std
number_of_interaction_in__weekday__allday__text__mean 0.8% 0.37
number_of_interaction_in__allweek__allday__text__mean
interactions_per_contact__weekday__allday__text__max__mean
balance_of_contacts__weekend__night__text__max__mean 0.5% 0.53
balance_of_contacts__weekend__night__text__median__mean
balance_of_contacts__weekend__night__text__min__mean
  1. For each factor, the three indicators with the highest loading are shown here. The F1 score is weighted by the class frequency to account for the data imbalance. Despite not being predictive of gender, the bottom three factors capture a significant amount of variance in the data.