Skip to main content

Table 5 The most (top 3) and least (bottom 3) predictive factors of gender in the South Asian data

From: Improving official statistics in emerging markets using machine learning and mobile phone data

Top indicators in the factor

Proportion of variance

Weighted F1 score

call_duration__weekday__day__call__min__std

0.4%

0.68

call_duration__weekday__day__call__median__std

call_duration__weekday__day__call__min__mean

call_duration__weekend__night__call__min__mean

0.5%

0.68

call_duration__weekend__night__call__min__std

call_duration__weekend__night__call__median__std

percent_initiated_interactions__weekday__day__call__mean

0.3%

0.66

percent_initiated_interactions__allweek__day__call__mean

percent_initiated_interactions__weekend__day__call__mean

number_of_interaction_in__allweek__allday__text__std

0.6%

0.37

number_of_interaction_in__allweek__day__text__std

number_of_interaction_in__weekday__allday__text__std

number_of_interaction_in__weekday__allday__text__mean

0.8%

0.37

number_of_interaction_in__allweek__allday__text__mean

interactions_per_contact__weekday__allday__text__max__mean

balance_of_contacts__weekend__night__text__max__mean

0.5%

0.53

balance_of_contacts__weekend__night__text__median__mean

balance_of_contacts__weekend__night__text__min__mean

  1. For each factor, the three indicators with the highest loading are shown here. The F1 score is weighted by the class frequency to account for the data imbalance. Despite not being predictive of gender, the bottom three factors capture a significant amount of variance in the data.