Skip to main content
Figure 1 | EPJ Data Science

Figure 1

From: Allotaxonometry and rank-turbulence divergence: a universal instrument for comparing complex systems

Figure 1

A. An example allotaxonomic ‘rank-rank histogram’ comparing word usage ranks on two days of Twitter, 2016/11/09 and 2017/08/13. These dates are the day after the 2016 US presidential election and the day after the Charlottesville Unite the Right rally. We extracted words first as 1-grams (contiguous sets of non-whitespace characters) from tweets identified as English [45] and then filtered to match simple latin characters (see Sect. 5.1). We orient all histograms so that the comparison is left-right removing a potential misperception of causality. In general, we compare ranked lists of types for two systems \(\Omega _{1}\) and \(\Omega _{2}\) by first generating a merged list of types covering both systems. We then bin logarithmic rank-rank pairs \((\log _{10}r_{\tau ,1},\log _{10}r_{\tau ,2})\) across all types and uniformly in logarithmic space. For bin counts, we use the perceptually uniform colormap magma [46], and place a count scale in the bottom left corner (element A). We automatically label words at the fringes of the histogram. Bins on either side of the central vertical line represent words that are used more often on the corresponding date. For example, ‘Charlottesville’ was ranked 67,220 on 2016/11/09 and 113 on 2017/08/13, while ‘Nazis’ moved from \(r=9149\) to 129. Words are given alternating shades of gray for improved readability. The discrete, separated lines of boxes nearest to each bottom axis comprise words that appear on Twitter on only that side’s date: ‘exclusive types’. Moving up the histogram, the two other distinct lines above the ‘exclusive-type lines’ correspond to words that appear once and twice on the other date. The three horizontal bars in the lower right show system balances. The top bar only appears for ranked lists of types which have measures associated (e.g., word frequency, market capitalization of corporations). Here, the top bar indicates the balance of total counts of words (tokens) for each day: 59.9% versus 40.1%. The middle bar shows the percentages of the combined lexicon (types) for the two days that appear on each day: 63.2% versus 61.6%. And the bottom bar shows the percentage of words (types again) on each day that are exclusive: 60.8% and 59.8%. Each bar’s label is configurable (e.g., ‘total market cap’). B.–D. The three rank-rank histograms on the right show the special benchmark cases of: B. A size ranking for a system compared with itself (vertical line; \(\Omega _{1}\) versus itself); C. A ranked list versus a random shuffling of component types (two randomizations of \(\Omega _{1}\)); and D. Two size rankings for systems with no shared component types: A ‘vee’ structure (we re-used \(\Omega _{1}\) and \(\Omega _{2}\), modifying words to prevent matches). For the cells in the main histograms in this paper, we use cell side lengths of 1/15 of an order of magnitude; we use 1/5 for plots B–D.

Back to article page