Skip to main content

Table 3 Accuracy of Random Forest classifier in predicting the prevalence of diseases in London areas. The results of two classifiers are reported: (i) binary classification of areas in the top and bottom quartiles of the three diseases’ prevalence; (ii) ternary classification where an equally-sized class containing training instances randomly sampled from the two central quartiles is added. The predictive features are six: gender, average age, education level, item weight, nutrient diversity, and calorie concentration. The accuracy of a random baseline classifier is 0.5 for the binary case, and 0.33 for the ternary case. Numbers in parenthesis represent the standard deviation on the 10-fold cross validation

From: Large-scale and high-resolution analysis of food purchases and health outcomes

 MedicineRandomDemographicDiversity + CalorieAll
BinaryHypertension0.500.60 (0.06)0.80 (0.05)0.82 (0.05)
Cholesterol0.500.59 (0.06)0.81 (0.05)0.81 (0.05)
Diabetes0.500.79 (0.06)0.86 (0.05)0.91 (0.04)
TernaryHypertension0.330.40 (0.05)0.54 (0.05)0.57 (0.04)
Cholesterol0.330.41 (0.05)0.53 (0.06)0.54 (0.07)
Diabetes0.330.53 (0.04)0.63 (0.04)0.68 (0.03)