Table 2 Classification scores for the panel of experts used to label the Reply Tree tweets using the top 25 experts. γ is a confidence threshold on \(S_{h}(t)\), viz., \(|S_{h}(t)=2p_{h}(t)-1|\ge \gamma \). “Percent Labeled” is the percentage of examples in the test set that were labeled as either hate or counter at a confidence level of γ. The top row represents traditional accuracy measures, which compare favorably to previous studies that used smaller unbalanced data sets and achieved F1 scores ranging from 0.49 to 0.77 [32]. The remaining rows are more nuanced as not all examples in the tests sets are being labeled correctly or incorrectly and thus these rows need to be viewed with cautious optimism; see [32] for an in-depth discussion of these issues