Skip to main content

Table 3 Accuracy of Random Forest classifier in predicting the prevalence of diseases in London areas. The results of two classifiers are reported: (i) binary classification of areas in the top and bottom quartiles of the three diseases’ prevalence; (ii) ternary classification where an equally-sized class containing training instances randomly sampled from the two central quartiles is added. The predictive features are six: gender, average age, education level, item weight, nutrient diversity, and calorie concentration. The accuracy of a random baseline classifier is 0.5 for the binary case, and 0.33 for the ternary case. Numbers in parenthesis represent the standard deviation on the 10-fold cross validation

From: Large-scale and high-resolution analysis of food purchases and health outcomes

   Accuracy
  Medicine Random Demographic Diversity + Calorie All
Binary Hypertension 0.50 0.60 (0.06) 0.80 (0.05) 0.82 (0.05)
Cholesterol 0.50 0.59 (0.06) 0.81 (0.05) 0.81 (0.05)
Diabetes 0.50 0.79 (0.06) 0.86 (0.05) 0.91 (0.04)
Ternary Hypertension 0.33 0.40 (0.05) 0.54 (0.05) 0.57 (0.04)
Cholesterol 0.33 0.41 (0.05) 0.53 (0.06) 0.54 (0.07)
Diabetes 0.33 0.53 (0.04) 0.63 (0.04) 0.68 (0.03)
\