Allotaxonometry and rank-turbulence divergence: a universal instrument for comparing complex systems

Complex systems often comprise many kinds of components which vary over many orders of magnitude in size: Populations of cities in countries, individual and corporate wealth in economies, species abundance in ecologies, word frequency in natural language, and node degree in complex networks. Here, we introduce ‘allotaxonometry’ along with ‘rank-turbulence divergence’ (RTD), a tunable instrument for comparing any two ranked lists of components. We analytically develop our rank-based divergence in a series of steps, and then establish a rank-based allotaxonograph which pairs a map-like histogram for rank-rank pairs with an ordered list of components according to divergence contribution. We explore the performance of rank-turbulence divergence, which we view as an instrument of ‘type calculus’, for a series of distinct settings including: Language use on Twitter and in books, species abundance, baby name popularity, market capitalization, performance in sports, mortality causes, and job titles. We provide a series of supplementary flipbooks which demonstrate the tunability and storytelling power of rank-based allotaxonometry.


Introduction 1.Instruments that capture complexity
Science stands on the ability to describe and explain, and precise quantification must ultimately secure any true understanding [1].Description itself rests on well-defined, reproducible methods of measurement, and over thousands of years, people have generated many national museums' worth of physical and mathematical instruments along with fundamental units of measurement.
Many instruments measure a single scale.In a plane's cockpit, barometers, altimeters, and thermometers report pressure, height, and temperature.And like a pilot flying a plane, by using human-comprehendible dashboards of single-dimension instruments, we are consequently able to successfully monitor and manage certain complex systems and processes.
But for complex phenomena made up of a great many types of components of greatly varying 'size' (clarified below in Sect.1.2)-languages, ecologies, stock markets-we must confront two major problems with our dashboards of simple instruments [2].
First, in the face of system scale, dashboards become overwhelming.We find ourselves in high-dimensional, rapidly reconfiguring cockpits with instruments constantly appearing and disappearing.We need meters for every species, every company, every word.As a consequence, we routinely reduce a system's description to a few summary statistics, and often to only one [3].We quantify the massive complexity of intellect through intelligence quotients and grade point averages, health through body mass index, the complexity of civilizations by one number [4], and arguably anything by monetary value as an encoding of belief.Of course, for some systems, dimension reduction is possible and we have essential techniques for doing so such as principal component analysis [5].Relevant to our work here, information theoretic measures such as Shannon's entropy or the Gini coefficient are conspicuous single-number quantifications used across many fields, whether or not there is any meaningful connection to the optimal encoding of symbols for signal transmission [6][7][8][9].
Second, enabling an ability to discern change is evidently an elemental feature of any scientific instrument.Broken altimeters are a staple of stories where something goes wrong with a plane (a plane-in-trouble is the larger story trope unto itself ).While tracking changes in simple measures and statistics is essential (the Dow Jones is up; today is warmer than yesterday), the cognitive trap of the single number measurement means we miss seeing the internal dynamics, and this is especially true when global statistics are constant (the Dow Jones is unchanged but stocks were volatile; today's temperature is the same as yesterday but it's now raining and windy).
To contend with scale and the internal diversity of complex systems, we need comprehendible, dynamically-adjusting dashboards.For comparisons of complex systems, we will argue for dynamic dashboards that have two core elements [10]: 1.A 'big picture' map-like overview; and 2. A ranking of the most 'important' components afforded by a tunable measure that is as straightforward as possible.To help with our framing, we introduce a terminology family.We will use 'allotaxonomy' (other order) to mean the general comparison of the structures of two complex systems; 'allotaxonometrics' to refer to quantified allotaxonomy; and 'allotaxonometers' and 'allotaxonographs' for the instruments of allotaxonometrics.

Size rankings, Zipf's law, and rank turbulence
While the instrument we develop here will have broader application, its construction focuses on two regular features of complex systems: Heavy-tailed 'size-rank distributions' (rather than laws), and what we will call 'rank turbulence'-a phenomenon of systemsystem comparison.We describe and discuss these two signatures in turn.
We will consider systems where each component type τ has at least one measurableand hence rankable-'size' s τ .To help make clear what we mean by component types and component sizes, we list some examples in the realms of language, ecology, and stock markets.
Type size: Number of times a word appears in a book.In "Pride and Prejudice", for example, the corresponding counts are 4058, 90, and 0. In linguistics, the component type and component size distinction is referred to as types and tokens [11], while more abstractly in semiotics, we have the signifier and the signified.
Example type: The species Ornithorhynchus anatinus, the platypus.Type size: The number of platypuses ('instances' of the species) living in Australia in the wild.Given an ecology, species may be ranked by their population numbers.
Example types: The publicly traded companies of Apple and Microsoft.Type size: Market capitalization.Apple and Microsoft may be viewed as components of the publicly-owned corporate world.The sizes of corporations may be broken down into many rankable dimensions such as annual revenue or number of employees worldwide.Market capitalization represents a kind of current collective belief in terms of money.
The three examples above show some of the range of what size can mean, and why we cannot readily lift terminology out from one domain to apply to all.Size for a word in a corpus means the number of indistinguishable instances of that word (many identical entites-tokens); size for species means the number of 'biological replications' of an individual type (many genetically similar entities of varying ages); and size for a corporation means a monetary value (one entity).
Some further examples of component size are rate (e.g., appearance of words in streaming text), physical dimensions (e.g., typical animal length or weight for a species), social popularity (e.g., number of plays of a song), scoring in sports by individual players, and so on.
We make clear that we may have no knowledge of the underlying component sizeswe may only have rankings (e.g., book rankings provided by a seller, but with numbers of sales withheld).Our core instrument functions only on rankings, incorporating size data (if known) for minor diagnostics.
When a system's component types are ranked in descending order of some size s, we will write the size of the rth ranked component as s r .The function s r is commonly termed a system's 'Zipf distribution' [12].Here, we will refer to such "a component ranking by decreasing size" as a "size ranking" or, for brevity and given our paper's framing, simply a "ranking".(We advocate for the plainspoken naming of scientific concepts, in part to avoid misattribution [13] and with the belief that scientific truths can and should be meaningfully named for what they are.) Though ranking is a widespread, everyday concept, the associated language can be confusing: 'High rank' means low r, and 'low rank' means high r.The highest rank size is thus s 1 .(We accommodate tied ranks per Sect.2.1 below.)Ranked sizes of components of complex systems commonly present (at least approximately) a decaying power law [12,[14][15][16].That is, the size s r of the rth ranked component scale as s r ∼ r -ζ where ζ > 0. The case ζ = 1 has come to be generally referred to as Zipf 's law [12].The corresponding frequency distribution for component sizes will behave as f (s) ∼ s -γ where γ = 1 + 1/ζ > 1.
Power laws and their discontents aside, examples of heavy-tailed size-rank distributions abound, with a few examples including word and phrase frequency in language [17,18], city populations [12], node degrees in scale-free networks [19], firm size [20], and numbers of dependencies for software packages [21].
We emphasize that our instrument is of use for comparing more general complex systems, for which we need only a reasonably diverse set of component types, and for which the size ranking s r may bear any kind of heavy-tailed distribution.Below, we will explore systems with maximum component rank between roughly 10 2.5 and 10 9 .
There have been two persistent criticisms of Zipf 's law, one unfounded, the other true but misleading and central to our work here.The first is that Zipf 's law is a meaningless artifact that arises for free through randomness [22,23]; this is negated by a simple analysis [24], and moreover, theories of generative mechanisms have long been elaborated and tested (and contested) with the rich-get-richer mechanism proving to be a pervasive underlying algorithm [14,21,25,26].
The second enduring criticism is that Zipf 's exponent ζ does not vary measurably, whether it be over time for a given system or across comparable systems.Zipf 's law is often plotted with an unadorned rank r on the horizontal axis, but each rank represents a component type from some vastly higher dimensional space of elements: A language's lexicon, species in an ecology, corporations in an economy.
Thus, even if two meaningfully comparable systems match exactly in a given size ranking s r , there may well be a rich variation in the ordering of components [17,27].With this understanding, in earlier work by our group on comparing size rankings of word usage in large-scale texts, we introduced the concept of "lexical turbulence" [27].We showed that in comparing word usage across decades in the Google Books English Fiction (GBEF) corpus, the flux of words across rank boundaries-rank flux φ r -increased as φ r ∼ r ν (we found a break in scaling which we set aside here for simplicity [28,29]).We observed superlinear scaling for rank flux with ν > 1.2: Common words are relatively stable in rank, rare words much more unstable.
Here, we expand from the text-specific concept of lexical turbulence to a general one of 'rank turbulence' , which in turn will help motivate our formulation of a pragmatic 'rankturbulence divergence' .

Motivation for a rank-based divergence
In comparing complex systems, why should we use component size ranks rather than probabilities or rates?Indeed, we may select from a smorgasbord of ways to compare two probability distributions for categorical data [30][31][32].Ref. [31] catalogs around 60 probability-based comparisons which are variously distances, divergences, similarities, fidelities, and inner products.And Ref. [32] details three sprawling, interrelated, singleparameter families of information-theoretic divergences.
Five main reasons push us away from probability-based divergences and towards creating and using rank-based divergences.
First, normalization problems may arise from subsampling heavy-tailed distributions [17,33].In natural ecological systems, for example, estimating the total number of organisms is famously difficult [33][34][35][36].We can only then speak of relative rates and not absolute rates, and even then only for common enough species.For Twitter, for example, subsampling n-grams-phrases containing n consecutive words and/or other text elements-allows for robust estimation of the rates of common n-grams but not rare ones.
Second, not all component type characteristics can be construed (or misconstrued) as probabilities or rates.For example, rankings for many kinds of sports-at the team and player level, and not discounting the role of chance-derive from scores achieved through repeated competition [37][38][39].
Third, in comparison with probability-based rankings, we are able to more easily contend with components that appear in only one of two systems under comparison.We demonstrate this visualization feature as we build rank-turbulence divergence (RTD) in the following sections.
Fourth, rank orderings potentially allow for powerful and robust non-parametric statistical measures such as the standard rank correlation coefficient.All told, while in moving from sizes to rankings we may trade information for simplification, we still preserve a great deal of meaningful structure.We also expect rankings to be generally less susceptible to perturbations and errors in measurement.
Fifth and finally, rankings are an easily interpretable, ubiquitous construct, familiar to many.Ranked lists suffuse media surrounding entertainment (e.g., box office), music (Billboard charts), and sports.Indeed, we will rank anything we believe we can rank along a wide range of (often questionable) dimensions and composite scores: Individuals (wealth, fame), countries (GDP, freedom, safety, Olympic medals), cities (liveability, poverty), corporations (market capitalization, environmental records, workplace experience), universities (endowments, number of Nobel prize winners), students (grades), and animals (intelligence, dangerousness).
The above notwithstanding, distances based on comparisons of size rankings are to our knowledge relatively few, focus on traditional comparative metrics like Kendall's Tau and Spearman's rank correlation coefficient [40], and seem limited in application to extremely small systems, for example, comparing the top 20 to 50 ranked hits from two different search engines [40][41][42].
And while we have argued for a rank-turbulence divergence here, we nevertheless have separately constructed and explored a probability-turbulence divergence in Ref. [43].Analogous in construction to rank-turbulence divergence, we show that probabilityturbulence divergence is more sensitive to detailed system changes, has distinct limiting behavior, and corresponds to a suite of extant divergences.
In Sect.3, we use all of these elements to realize rank-turbulence divergence as a tunable instrument for complex system comparison through rank-turbulence divergence allotaxonographs.To both support our general explanation and explore systems in their own right, we consider comparisons at different points in time for four case studies: 1. Daily word use on Twitter, 2. Tree species abundance, 3. Baby names in the US, and 4. Market capitalization for companies.
To help demonstrate the tunability of rank-turbulence divergence and its behavior over time for dynamically evolving complex systems, we provide a suite of 'Flipbooks' of allotaxonographs as supplementary online material on the arXiv and as part of the paper's online appendices: http://compstorylab.org/allotaxonometry/.Our Flipbooks expand on the paper's allotaxonomic analyses to include season point tallies for players in the National Basketball Association (NBA); word usage in the Google Books corpus; word usage in the seven Harry Potter books; causes of death; and job advertisements.As a guide, we outline all Flipbooks in Sect. 4.
We present details of datasets and code in Sect.5, and we round off our paper with some concluding thoughts in Sect.6.

Rank-turbulence divergence 2.1 Notation, ranking methodology, and exclusive types
As mentioned in the introduction, we use simple size ranking [12], ordering a system 's types from largest to smallest size according to some measure (number, probability, mammalian fur density, etc.).Again, we write s τ for the size of component type τ .We further indicate the rank of type τ as r τ , and the ordered set of all types and their ranks as R .
In the case of ties, we use the conventional tied rank method of fractional ranking.For all types with the same size, we assign the mean of the sequence of ranks these types would occupy otherwise.Retaining tied information in this way makes for more sensible analytic treatment (e.g., the sum of all ranks for N types will be 1  2 N(N + 1), regardless of ties).Ties (and near ties) will be important for our visualizations of rank-turbulence divergence.
Now, given two systems, 1 and 2 , both comprised of component types (e.g., the species of two ecosystems) of varying and rankable size (e.g., number of individuals in a species), we express rank-turbulence divergence between these systems as In Sect.2.4, we will establish α as a single tunable parameter with 0 ≤ α < ∞.Whatever complexities these systems may contain-such as networks of componentswe are implicitly leaving them aside, but elaborations of our instrument will allow for their incorporation.If we have two ranked lists to compare, R 1 and R 2 , we will more directly write D R α (R 1 R 2 ).The divergences we will consider here will all be expressible as linear sums of per-type contributions, meaning we can write: We sort types by descending contribution (which will all be positive), δD R α,τ (R 1 R 2 ), indicating this ordering by the set R 1,2;α , the appropriately sequenced union of the types from both systems.
For the large-scale systems we are interested in, we expect that the overlap of types between any two systems will be partial, and generally far from complete.Hashtags on Twitter for example are constantly being invented, along with myriad lexical peculiarities (keyboard mashings, misspelling, mistypings, and more [44]).
Therefore, when comparing two systems, we extend the list of types in both systems to be the union of the types for both.If sizes are known, the sizes of types not present in a system will be zero.We will then naturally assign the same equal last rank to all types that appear in one system and not the other.
We call types that are present in one system only 'exclusive types' .When warranted, we will use expressions of the form (1) -exclusive and (2) -exclusive to indicate to which system an exclusive type belongs.

Rank-rank histograms for basic allotaxonomy
In Fig. 1, we show an example of our base system-system comparison plot, which we will call a 'rank-rank histogram' .We compare word usage on two days of Twitter: The day after the 2016 US presidential election, 2016/11/09, and the second day of the Charlottesville Unite the Right rally, 2017/08/13 (see Sect. 5.1 for description of datasets).As we describe below, our histograms fully present the meaningful differences between two size-rank distributions, allowing for divergence measures to be overlaid in easily interpretable ways (c.f., [47,48]).As an aid for the reader, we include dash-lined-bordered elements from Fig. 1 throughout our discussion.
To construct Fig. 1, we first parse tweets into 1-grams (preserving case), find 1-gram frequencies for each day, and then determine each day's separate ranked list of 1-grams according to those frequencies.For both days, and purely by choice, we take the subset of 1-grams that contain simple latin characters.We next generate a merged list of latin character 1-grams observed on both days and thereby obtain rank-rank pairs for all 1grams.As described above, the separately ranked lists for each day will be extended by exclusive 1-grams (i.e., those that appear on only one day).All exclusive 1-grams will be tied for the last rank on the day they do not appear.
In general, we will denote the rank of type τ in system (1) by r τ ,1 , and the same type's rank in system (2) by r τ ,2 .
For our histograms, we bin rank-rank pairs (r τ ,1 , r τ ,2 ) into cells uniformly in logarithmic space.Cell width is adjustable; here we choose 1/15 of an order of magnitude.We use a perceptually uniform colormap (magma [46]), with the number of rank-rank pairs per cell increasing per the lower left scale (Fig. 1A).That the rank-rank pair counts per cell reach up towards 10 6 should make clear that some form of histogram is necessary for attempting to visualize the kind of rank turbulence we see here for Twitter.A simple plot of all (r τ ,1 , r τ ,2 ) points produces an incomprehensible density.
We orient our histograms in a diamond format, rotating the standard horizontal-vertical axes π/4 counterclockwise.We do so to eliminate a perceptual bias towards interpreting causality (separately suggested in [49]).The vertical and horizontal coordinates in the rotated histogram are proportional to log 10 r τ ,1 r τ ,2 (measured downwards) and log 10 r τ ,2 /r τ ,1 Figure 1 A. An example allotaxonomic 'rank-rank histogram' comparing word usage ranks on two days of Twitter, 2016/11/09 and 2017/08/13.These dates are the day after the 2016 US presidential election and the day after the Charlottesville Unite the Right rally.We extracted words first as 1-grams (contiguous sets of non-whitespace characters) from tweets identified as English [45] and then filtered to match simple latin characters (see Sect. 5.1).We orient all histograms so that the comparison is left-right removing a potential misperception of causality.In general, we compare ranked lists of types for two systems 1 and 2 by first generating a merged list of types covering both systems.We then bin logarithmic rank-rank pairs (log 10 r τ ,1 , log 10 r τ ,2 ) across all types and uniformly in logarithmic space.For bin counts, we use the perceptually uniform colormap magma [46], and place a count scale in the bottom left corner (element A).We automatically label words at the fringes of the histogram.Bins on either side of the central vertical line represent words that are used more often on the corresponding date.For example, 'Charlottesville' was ranked 67,220 on 2016/11/09 and 113 on 2017/08/13, while 'Nazis' moved from r = 9149 to 129. Words are given alternating shades of gray for improved readability.The discrete, separated lines of boxes nearest to each bottom axis comprise words that appear on Twitter on only that side's date: 'exclusive types' .Moving up the histogram, the two other distinct lines above the 'exclusive-type lines' correspond to words that appear once and twice on the other date.The three horizontal bars in the lower right show system balances.The top bar only appears for ranked lists of types which have measures associated (e.g., word frequency, market capitalization of corporations).Here, the top bar indicates the balance of total counts of words (tokens) for each day: 59.9% versus 40.1%.The middle bar shows the percentages of the combined lexicon (types) for the two days that appear on each day: 63.2% versus 61.6%.And the bottom bar shows the percentage of words (types again) on each day that are exclusive: 60.8% and 59.8%.Each bar's label is configurable (e.g., 'total market cap').B.-D.The three rank-rank histograms on the right show the special benchmark cases of: B. A size ranking for a system compared with itself (vertical line; 1 versus itself ); C. A ranked list versus a random shuffling of component types (two randomizations of 1 ); and D. Two size rankings for systems with no shared component types: A 'vee' structure (we re-used 1 and 2 , modifying words to prevent matches).For the cells in the main histograms in this paper, we use cell side lengths of 1/15 of an order of magnitude; we use 1/5 for plots B-D.
(measured rightwards), and these are dimensions we will encounter later in our construction of rank-turbulence divergence.
Types that have higher rank in system 1 will be represented by points on the left of the vertical r τ ,1 = r τ ,2 line, while those with higher rank in system 2 will appear on the right side.Types falling along or near the center vertical line have the same or similar ranks in both systems.
For all rank-rank histograms we show in the main paper, we compare systems at different time points.Time moving from left-to-right is a natural choice, and will govern our arrangement of dynamically evolving systems.In general however, comparisons between two systems may not involve any left-right ordering, and the choice will be arbitrary, (e.g., comparison of word usage in two books published in the same year or species abundance in two distinct ecological systems).
We automatically annotate words along the edges of the histogram.To do so, we first specify a fixed bin size moving down the vertical axis.For each bin and each side of the plot, we find the word furthest away horizontally from the center line, i.e., the word maximizing | log 10 r τ ,1 /r τ ,2 |.Annotated words are oriented to the far side of the point (r τ ,1 , r τ ,2 ) relative to the center, but are vertically centered by bin for overall clarity (meaning that their vertical position relative to (r τ ,2 , r τ ,1 ) will fluctuate).For these bare histograms with no divergence measure, we also assign type names with alternating shades of gray for readability.Where more than one word is equally far away from the center, we randomly choose one as a representative example.
To aid a user's perception of what meaning might be conferred by a rank-rank histogram, we highlight a selection of the annotated words in Fig. 1.Broadly, there are four main regions: 1.The top of the diamond; 2. The sides of the histogram; 3. The lower linear and point structures of the histogram; and 4. The bottom of the diamond.Fig. 1B: Types appearing towards the top of the diamond rank high for both systems.For Fig. 1, the 1-gram 'RT' is the most common word on both days: r RT,1 = r RT,2 = 1.Signifying retweet, 'RT' is an important-if Twitter-specific-functional structure, indicating the strength of social amplification on Twitter.The words 'the' and 'to' are ranked 2nd and 3rd on both dates, while 'and' and 'is' are ranked 4th and 5th on 2016/11/09 and reversed to 5th and 4th on 2017/08/13, leading to their offset locations.Such changes of high rank types will be important in analyzing many kinds of systems, and we will see later that they are only picked up by certain divergences.Fig. 1C: Moving down the histogram, we see that turbulence starts to become noticeable around r = 10 2 , and we see increasingly less common and differentiating words appear.Types appearing furthest horizontally from the center vertical axis show the most rela-tive change in rank.On 2016/11/09, 'Trump' stands out relative to nearby words.Further down, ' America, 'Donald' , 'voters' , and 'election' are all clearly off-axis.On 2017/08/13, the words 'Charlottesville' and 'Heyer' are most prominent (Heather Heyer was a protester who was murdered by vehicular homicide on August 12, 2017).(seemingly) unrelated names and events also appear.On the left of the histogram and/or list, we see 'gorilla' and 'Meteorite' .Harambe was a gorilla who was killed in a Cincinnati zoo after a boy entered his enclosure in 2016/05.Harambe became part of various internet memes including ones putting him forward as a write-in candidate for US president.On the right of the histogram, we find Lady Gaga and Zara Larsson (both performed concerts), and the K-pop (Korean pop) band BTS [50] which was enjoying its rise to ultrafame over this time period [51].
Fig. 1E: The separated lines and points at the bottom of the histogram arise from logarithmic spacing.For systems with heavy-tailed rankings of discrete-sized components, we often observe many types of the least size.Here, where type size is word count, we have many hapax legomena-words that appear only once in a corpus.For books approximately obeying Zipf 's law, the fraction of a lexicon that appears is around 1/2 [14]-the rare are legion.
Moving upwards from the bottom, the three separated lines in Fig. 1's histogram correspond to words appearing zero times ('exclusive types' by our definition), once, and twice on the other side's day.
For example, at the extreme of the lowest line on the right, we see 'Cvjetanovic' , a (2) exclusive word that is highly ranked on 2017/08/13 (r Cvjetanovic,2 = 672).The word is the last name of a member of Identity Evropa who was part of the Unite the Right Rally.A photo of Cvjetanovic holding a tiki torch and yelling was widely circulated [52].The word 'Cvjetanovic' did not appear on 2016/11/09 and with zero counts, is tied with many other words that only appear on 2017/08/13 (r Cvjetanovic,1 = 1,552,865).As another example, the word 'Heyer' appeared once on 2016/11/09 and is consequently part of the second discrete line on the right side.appearing once on only one of the two dates (per the count scale, Fig. 1A).
We emphasize that types annotated at or near the bottom of the diamond cannot be important individually-no divergence measure should present 'richava' as a meaningful word in itself for these two days of Twitter.Even so, indicating a few examples of these rare and unimportant words along the bottom of the histogram provides a helpful check that this is indeed the case.With the aim of improving the instrument's affordance of understanding, when we introduce rank-turbulence divergence, we will fade annotations according to type-level divergence contributions.Annotations for doubly rare types will always be strongly backgrounded.
Fig. 1G: For all allotaxonographs, we show balances at the bottom right of the rank-rank histogram.Two kinds of exclusive type comparisons for the numbers of types in each system are recorded in the bottom two bars, while the top bar conveys information about type sizes.The top bar is rendered only when sizes are known (this is the only part of the allotaxonographs that is not determined purely from type rankings).In the present paper, we work from datasets with known type sizes, and all allotaxonographs will have all three bars.All balances show normalized quantities rather than absolute numbers As we will see, these three balances can vary greatly across system comparisons.
The top balance bar shows the relative balance of the two systems' sizes, if known.For our Twitter example, this top bar shows the breakdown of total counts of 1-grams (type size) between the two dates at 59.9% and 40.1%.We thus see that the election generated about 50% more 1-grams (which tracks with tweets) than events of Charlottesville.
The middle balance bar shows the fraction of types in each system as a percentage of the union of types from both systems.For Twitter, we have that of all words in the combined lexicon for the two days combined, just over 60% appear on each of the two days (63.2% and 61.6%).
The bottom balance bar shows that given a system's set of types, what percentage of those types appear only in that system-exclusive types.For the Twitter example, we take the separate lexicons for each day, and find that around 60% of words are exclusive for both days, further giving a sense of strong turnover (60.8% and 59.8%).
We add that the script for generating allotaxonographs (figallotaxonometer9000.m in Matlab provided in the paper's Gitlab repository) returns some diagnostics including the underlying numbers used to compute the above balances.
Figs. 1H, I, and J show examples of three extremes of how systems might compare on rank-rank histograms.
In Fig. 1H, we compare the size ranking for identical systems ( 1 from Fig. 1).The outcome is a colormap version of the system's rankings binned logarithmically and arranged on the vertical r τ ,1 = r τ ,2 line.
In Fig. 1I, we present the visualization of a system compared with a randomized version of itself.The nature of logarithms means that the lower triangle is well filled with density growing with increasing rank.Using a linear scale, we would see a statistically uniform histogram.
Finally, in Fig. 1J, we compare size-rank distributions for systems with completely distinct sets of types.After merging types across systems, ranking of types for each system places all types of the other system in a tie for last place.The result is two marginal sizerank distributions forming a 'vee' .We have already seen examples of these linear features in Fig. 1.If system component lists are sufficiently truncated-whether by measurement limitations or by choice-we will also see these kinds of marginal structures appear but in an inconsistent fashion.We will discuss truncation effects further in Sect.3.6, after introducing rank-turbulence divergence.

Desirable allotaxonometric features for rank-turbulence divergence
On their own, our annotated rank-rank histograms give a map-like overview of how two systems differ.For Twitter, Fig. 1 presents a clear texture of words associated with the 2016 US election on the left and the 2017 events of Charlottesville on the right.But which words are most important?How do we compare the relatively rare 'Heyer' with the common 'My' , both words that have higher ranks on 2017/08/13?
Our goal now is to construct a rank-based divergence for comparing complex systems, one that will function as an instrument overlaying rank-rank histograms.We would like our divergence to be able to bear the following 11 descriptors, which range from concrete and simple to qualitative: 1. Rank-based: Directly built for comparing ranked lists generated by any meaningful ordering.

Symmetric
= 0 only if the systems are formed by the same components with matching rankings, R 1 = R 2 .4. Metric-capable: Given the preceding two conditions are met, we would want D R α to also satisfy the triangle inequality. 5. Scale and unit invariant: This is automatic because rankings will not change if either one or both systems are rescaled in their entirety, or remeasured according to a different system of units.6. Linearly separable, for interpretability.As framed in Eq. ( 2), each type τ additively contributes to rank-turbulence divergence a quantity δD R α,τ (R 1 R 2 ), allowing for simple ranking of types to assess importance.7. Subsystem applicable: Ranked lists of any principled subset may be equally well compared (e.g., hashtags on Twitter, stock prices of a certain sector, etc.).8. Effective across system sizes, possibly size independent: While not being explicitly interpretable as certain probability divergences (e.g., Kullback-Leibler divergence), rank-turbulence divergence D R α ( 1 2 ) should be normalizable to allow for sensible comparisons of rank-turbulence divergences across system sizes.Linear separability means that whatever normalization we use, the ordering of contributions of individual types will be unchanged.9. Heavy-tailed distributions: Rank-turbulence divergence should be applicable to systems with rank-ordered component size distributions that are heavy-tailed.10.Tunable: The acknowledgment that while many stand-alone divergences exist for probability distributions [31,32], in practice there are families of divergences on offer, and these have the potential to be adaptive and provide much more power and insight [32].11.Storyfinding: Features 1-10 will ideally combine to help us rapidly see which types are most important in distinguishing two ranked lists.

Development of rank-turbulence divergence
With these features in mind, we move now to properly constructing our conception of rank-turbulence divergence.We begin with the observation that by definition, a type τ 's size rank is inversely related to its size.We thus will want to deal with inverses of ranks.
Given element τ has a size rank r τ ,1 in system 1 and r τ ,2 in system 2, a raw starting point for an element-level divergence incorporating rank inverses would be: As we will demonstrate later, experimentation with this fixed form reveals a bias towards types with high ranks (again, the highest rank is r = 1).We modify the above expression by introducing a parameter α ≥ 0: We now have tunability: As α → 0, high ranked types are increasingly dampened relative to low ranked ones.For words in texts, for example, the weight of common words and rare words will become increasingly closer together.(Our construction and its behavior are in parts resemblant of but distinct from that of generalized entropy [53][54][55] and Hill numbers in ecology [8,34].)At the other end of the dial, α → ∞, high rank types will dominate.For texts, function words will prevail while the contributions of rare words will vanish.
The α → ∞ limit will prove to be a natural parameter endpoint for rank-turbulence divergence when we realize it as an instrument, and is something we wish to preserve as we address the α → 0 limit.
However, the limit of α → 0 in Eq. (4) does not yet behave as we might hope.We see that if r τ ,1 = r τ ,2 , Eq. ( 4) tends towards which in turn will tend toward ∞ as α → 0.
In considering how to remedy this problematic limit, we observe that Eq. ( 5) contains a readily interpretable structure which we have already encountered in the preceding section: The log-ratio of ranks.In Sect.2.2, we established a graphical interpretation for the rank-rank histogram in Fig. 1.We identify |ln r τ ,1 r τ ,2 | = |ln r τ ,1 -ln r τ ,2 | as being proportional to the horizontal distance from the (log 10 r τ ,1 , log 10 r τ ,2 ) point to the histogram's vertical midline.
In order to fashion a well-behaved α → 0 limit, while (1) preserving the core of Eq. ( 4), ) maintaining the form of the large α limit, and (3) only using modifications that are monotonic in α, we introduce a prefactor and adjust the exponent in Eq. ( 4) as follows: The α → 0 limit is now simply |ln r τ ,2 |, while the α → ∞ limit is unchanged.(We note that an alternate modification of introducing a prefactor of α -1/α to Eq. ( 4) fails the requirement of monotonicity.) Finally, in summing over all types and incorporating a normalization prefactor N 1,2;α , we have our prototype, single-parameter rank-turbulence divergence: Deducing the form of the normalization factor N 1,2;α requires a combined analytic and numerical approach.We compute N 1,2;α by taking the two systems to be disjoint while maintaining their underlying ranked lists.Thus, we ensure 0 ≤ D R α (R 1 R 2 ) ≤ 1 where the limits of 0 and 1 correspond, respectively, to the two systems having identical and disjoint size-rank distributions.
To determine N 1,2;α , we observe that if the size-rank distributions are disjoint, then in (1) 's merged ranking, the rank of all (2) types will be r = N 1 + 1 2 N 2 , where N 1 and N 2 are the number of distinct types in each system.Similarly, (2) 's merged ranking will have all of (1) 's types in last place with rank r = N 2 + 1  2 N 1 .The normalization factor is then:

Tunability of rank-turbulence divergence: limits
We will use rank-turbulence divergence's tunability to accentuate more rare (α → 0) or more common types (α → ∞).For reference, we lay out the full expressions for these two limits, and will later see their graphical realizations.Per our construction of Eq. ( 7), in the limit of α → 0, we have where Types experiencing the largest relative change in rank will feature most strongly, and these are types that are rare in one system, and extremely common in the other.Because of the term ln r τ ,2 , the α = 0 limit for rank-turbulence divergence is most resemblant of the Kullback-Leibler and Jeffrey divergences [30].
In the limit of α → ∞, we have instead Having the lowest values of 1/r, highest-rank types will dominate the α → ∞ limit.The normalization factor for α = ∞ is: For probability-based divergences, the α = ∞ limit for rank-turbulence divergence aligns with the Motyka distance [30,31].
Because we are interested in real, finite systems, we are not concerned with convergence.Nevertheless, with appropriate treatment, infinite theoretical systems could be evaluated.
3 Rank-turbulence divergence graphs as allotaxonometric instruments

Anatomy of an allotaxonograph with word usage on Twitter as an example
We now combine rank-rank histograms with rank-turbulence divergence to generate a tunable single-parameter instrument for exploring how two systems differ.In Fig. 2, we present a 'rank-turbulence divergence graph' as an example allotaxonograph.We again compare the two days of Twitter-the 2016 US election with the 2017 Charlottesville riots-that we examined in Sect.2.2.
There are two main components to our general divergence graphs: A map-like histogram and an ordered list of types contributing the most to the divergence measure being employed.1/3,τ .In the ordered list, words are arranged left and right and colored gray and blue in accordance with the date on which they are most prevalent.The two dates' ranks for each word in the list are indicated on the opposite side.For example, r Trump,1 = 11 and r Trump,2 = 60, and r Heyer,1 = 862,482 and r Heyer,2 = 445.While an exact match is intended, a few annotated words on the histogram differ from Fig. 1 due to chance during the automatic annotation of the histogram (e.g., 'HURRICANE' on the left side in Fig. 1, but 'BRITAIN' here).The instrument's function and layout are highly configurable in our figure-building script.For example, the choice of divergence (rank or otherwise), axis limits, maximum length of type names, histogram cell size, and the guide adornments 'less talked about' and 'more talked about' are all system-specific settings.As a design choice, we limit the resolution of α to multiples of 1/12, For full details of the underlying histogram, see the caption of Fig. 1 and Sect.2.2 First, we build upon the histogram of Fig. 1.We use rank-turbulence divergence with α = 1/3, as indicated on the scale in the top left of the graph.We discuss the choice of α below.Fig. 2A: In all our divergence graphs, we include the divergence's expression above the top left of the histogram.We display the value of the divergence, which for our Twitter example is D R 1/3 (R 1 R 2 ) = 0.493.We also show the core form of RTD, excluding constants of proportionality.(Our figure-making code presents formats for other kinds of divergences such as generalized entropy and probability-turbulence divergence [43].)For our own implementation of rank-turbulence divergence, we have chosen to make the increments of α discrete as multiples of 1/12.This discretization is particularly useful for α ≤ 3/2, the range of α for which most of the variation in rank-turbulence divergence takes place.The α scale in Fig. 2A uses an inverse tangent transformation that is effective for functional use of the instrument.As we will see, near α = 0, the list's variation with steps of 1/12 is not abrupt.Fig. 2B: We overlay the histogram with contour lines of constant δD R 1/3,τ .The contour lines are chosen so that they are anchored at evenly spaced points along the bottom two axes, making for simple tracking as α is varied.
Fig. 2C: The inset to the upper right of the histogram provides a scale for values of δD R 1/3,τ , per the tick marks.This inset also shows the contour lines of the chosen instrument, matching those of the main histogram.
Our last enhancement is to foreground annotations for types based on how much they contribute to the divergence.The annotations and their locations on the histogram largely remain unchanged from Fig. 1 (some may vary because of chance in the automatic annotation).We now incorporate a linear gray scale based on δD R 1/3,τ , with higher scoring words accentuated, lower scoring words faded.We now see 'Trump' and 'Charlottesville' stand out among the histogram annotations of Fig. 2. Common words that have not changed rank ('RT' , 'the' , and 'to') as well as words rare on one day and absent on the other ('suededenim' and 'richava') have all been strongly backgrounded.
Second, we locate a list of words on the right of the instrument.For example, we see 'Trump' has the highest divergence contribution overall, moving from r = 11 to 60.These ranks indicate a maintenance of extraordinary levels of lexical ultrafame [51]), but the drop from r = 11 to 60 registers more strongly for δD R 1/3,τ than all other rank shifts.On the opposing date, 'Charlottesville' scores comparably to 'Trump' and is second overall.In contrast to 'Trump' , however, 'Charlottesville' is a word that changes rank dramatically across the two dates, moving from r = 67,220 to 113.Fig. 2E: It is useful to be able to see which 'important' (i.e., high δD R α,τ ) elements are part of only one system (i.e., important exclusive types).In the ordered list, we indicate exclusive types by a directed open triangle, that will either precede a word appearing on the left or trail a word appearing on the right.For Fig. 2 with α set at 1/3, there is only one such word in the top 40 divergence contributions: 'Cvjetanovic' (discussed in Sect.2.2).For general systems, as we tune α towards zero, more single-system types will move up the list, and conversely fall back down if we instead dial α towards ∞.
Fig. 2F: At the bottom of the word list, we indicate the percentage contribution to the divergence score from each system.Generally, we find these contributions to be close to equal.

Tuning rank-turbulence divergence allotaxonographs
For Fig. 2, we have chosen α = 1/3 because it delivers a reasonably balanced list of words with ranks from across the common-to-rare spectrum.Our choice here is based purely on a visual inspection.We have considered several automated methods for determining an optimal α, but leave these for future work.
To demonstrate how tuning α controls the contour lines and alters the word list on a rank-turbulence divergence graph, we provide Flipbook S1 where we sweep through a set of 11 values of α in steps: 0, 1  12 , 2 12 , 3  12 , 4  12 , 5  12 , 6  12 , 8 12 , 1, 2, 5, and ∞.As we increase α, the set of words (and in general, types) with highest δD R α,τ transform from being dominated by rare words to function words.Even so, a few words maintain prominence across a wide range of α.For example, 'Trump' is the top word for α = 1/3 to 5/4, dropping only to 5th for α = ∞.(Because of its function-word-like fame, for α ≤ 1/6, 'Trump' does not register in the top 40.)For 0 ≤ α ≤ 5/6, Charlottesville-related words lead the right side of the list ('Cvjetanovic' , 'Heyer' , and 'Charlottesville').At the limit of α = ∞, the only top 40 Charlottesville word is 'white' (per the prevalence of 'white supremacists' and similar terms).
To further our investigation, We provide two more Flipbooks for Twitter.Flipbook S2 shows how the allotaxonograph of Fig. 2 changes if we control the percentage of retweets included in our sample.In varying from 1% to 100%, we see that the texture of the election side does not change greatly-the amplified and unamplified versions of Twitter match well.However, the Charlottesville date shows that the 1% retweet sample is much more pop culture focused.As we move through Flipbook S2 and dial up to fully include all retweets for 2017/08/13, we see words surrounding the events in Charlottesville rise up the list of dominant contributions.
In Flipbook S3, we start with 2019/01/03 and compare forwards in time, roughly doubling the number of days for each step, ending with 2020/01/04, the date of the assassination of the Iranian general Soleimani by the United States.We see the topics of anchor date 2019/01/03 become more clear as the date moves further into the past: Government shutdown, the border wall, and Congresswoman Rashida Tlaib.The comparison future date travels though a wide range of events.We observe that rank-turbulence divergence slowly increases as we compare days increasingly further apart.Visually, we see the rankrank histogram broaden subtly.Determining how an optimal α changes with time scales would be a natural part of possible future work.
To explore in more depth the value of having a tunable allotaxonometric instrument, we move away from news and Twitter to consider distributions presented by two different kinds of systems, one ecological, the other cultural: Tree species abundances and popularity of baby names.

Species abundance: example rank-turbulence divergence allotaxonograph for the limit of α = 0
In Fig. 3, we show a rank-turbulence divergence graph comparing tropical tree species numbers on Barro Colorado Island (BCI) in the Panama Canal [61] for five-year censuses completed in 1985 and 2015 ( (1) and (2) ) [56].
In being visually close to the limit of comparing two identical rankings (D R 0 (R 1 R 2 ) = 0, Fig. 1H), the histogram's vertical linear form immediately shows that the species abundance distributions are strongly aligned.Because of the possibility of exogenous catastrophic events such as fires and the abrupt transitions accessible by complex dynamical Figure 3 Allotaxonograph using rank-turbulence divergence to compare tropical forest tree species abundance on Panama' s Barro Colorado Island (BCI) for 5 year censuses completed in 1985 and 2015 [56].This system comparison shows relatively little turnover or turbulence.We see none of the sideways flaring of the histogram towards the bottom-turbulence-as we did for Twitter word usage in Fig. 2. A choice of α = 0 for rank-turbulence divergence per Eq. ( 9) produces vertical contour lines that conform well to the histogram.From inspection of both the histogram and the δD R 0,τ list, the relative decline of a single species of pepper plant, Piper cordulatum [57][58][59][60], is the dominant dynamical change in the forest's composition.See Sect.5.1 for further notes on the BCI data systems [62], the composition of an ecological system may change dramatically over a few decades.For this example from BCI, however, we see a system that is strongly durable in its component rankings.
We numerically compare the 1985 and 2015 distributions by applying rank-turbulence divergence with α = 0, finding D R 0 (R 1 R 2 ) = 0.077.By inspection, we choose α = 0 here because of the match of the histogram with the verticality of the contour lines (we address optimal selection of α in our concluding remarks).The nature of the BCI example affords us an opportunity to demonstrate the limit of α = 0 for allotaxonometry, and is a secondary reason for including an example from ecology.
In Flipbook S4, we show how the allotaxonometer performs with α varying away from 0 to ∞.The visual match on the contour lines continuously degrades.
The BCI example's histogram is far from what we would expect of randomized systems (Fig. 1I).To see how RTD quantifies the difference between two observed systems and then between randomized versions of these systems, we construct a set of pairs of randomized systems, measuring rank-turbulence divergence for each.We do this by randomly permuting the species names within each system while leaving species counts fixed, thereby keeping the size-rank distributions the same.We can perform such a randomization for any system-system comparison (and we do so below again for baby names).
We denote the average randomized divergence for two rankings R 1 and R 2 as: For the BCI example, we find that the score D R 0 (R 1 R 2 ) = 0.077 is well short of the randomized equivalent of D R 0;rand (R 1 R 2 ) = 0.376 (average of 100 randomizations; standard deviation σ = 0.012).
Per the balance indicators, we see that the total number of individuals in each year's census is roughly the same (51.5% and 48.5%), that most types for both years appear in each system (95.6% and 92.5%), and that relatively few types are exclusive to each year (7.8% and 4.7%).Only two year-exclusive species make the top 40 for δD R 0,τ contributions: Bactris coloradonis (1985 only) and Trema integerrima (2015 only).Regarding changes in overall diversity, we see that the loss of Piper cordulatum has not been to the gain of a single species-there is no one species on the right of the histogram with a distinctly high δD R 0,τ .Of the top 10 species ranked by δD R 0,τ , 7 are species that have become relatively more abundant.For the top 40, the balance is 20 down and 20 up.Overall, our instrument's dashboard makes clear that there is a singular drop in Piper cordulatum's ecological role amid incremental (and possibly also important) changes for other species, straightforwardly directing future research attention.

Baby names: example rank-turbulence divergence allotaxonograph for the limit of α = ∞
For an example of where tuning rank-turbulence divergence's parameter α to the limit of ∞ is helpful, we explore the temporal evolution of US baby name popularity [63,64].Because of the richness of baby name trends, we will also show how the full range of α can be used to uncover cultural changes.
The dataset we use tabulates annual name frequencies running from 1880 to 2021.The dataset is derived from Social Security card applications which means it is (unsurprisingly) not an exact measurement of baby name frequencies, particularly for retroactive registrations for those born in the years before Social Security was enacted in 1935.
For privacy, there is a truncation instituted in the dataset, and only baby names for which there are 5 or more instances in a year are included.Our discussion and analysis below therefore carries the caveat that rare names are occluded from our view (for further details and limitations see Sect.5.1).
Because we will favor brevity in our discussion, when we write, for example, "baby girl names for 1968", we will mean "US baby girl names registered at least 5 times with Social Security in the year 1968." In Fig. 4, we use a rank-turbulence divergence graph with α = ∞ to compare changes in baby name frequencies for girls born in the US in 1968 and girls born in the US in 2018, a 50 year gap.In Fig. 5, we present the corresponding allotaxonomic graph for boy names.In the Anciliary files, we provide Flipbooks with α = ∞ showing half century changes for both girl and boy names starting in 1880 and moving forward in 5 year increments (Flipbooks S5 and S6), as well as Flipbooks for the same 1968-2018 comparison with α varying from 0 to ∞ (Flipbooks S7 and S8).For baby names, an interactive α ( 1 2 ) scores verging on that of the random equivalent.The asymmetry of the separated 2018-exclusive names and the balance score of 80.3% of all names in 2018 being new relative to 1968 show that while there is much social imitation (see 1970s, ' Jennifer'), baby names are highly innovative collectively.Note that at the bottom of the histogram, the annotated name is a 2018-exclusive word but it is oriented towards the left per our annotation method (with each run of the allotaxonometer script, the name is randomly chosen from all names in the specific histogram square) (see also Fig. 1 and Sect.2.2).See Fig. 5 for the boy name version.For 1968-2008, Flipbook S5 shows how the list of contributions to rank-turbulence divergence changes as α varies from 0 to ∞. Flipbook S7 provides a sweep of α = ∞ allotaxonometric graphs for girl names over time, for 50 year gap comparisons starting with 1880-1930 and moving forward in 5 year steps version of the instrument would allow tunable α and the choice of years to be readily explorable.
In contrast to the lexical turbulence of Twitter and the largely vertical form we saw for forest species counts, the histograms in Figs. 4 and 5 bear strong signatures of randomness and innovation.
First, as we saw in Fig. 1C, a random shuffling of ranked lists results in histograms predominantly weighted in the lower triangle of the plot.We see a strong imprint of this limiting case in Figs. 4 and 5, reflective of a great deal of cultural and societal change.
Second, we see dense exclusive-type lines at the base of both sides of the histograms in Figs. 4 and 5, the stamp of disjoint systems (Fig. 1D).The asymmetry of the histograms, with the separated exclusive-type line on the lower right, reflects the strong innovation of 2018 names relative to 1968.We note that the skew does not come from changes in system sizes as total numbers of births for the two years are comparable for girls and boys.
Overall, the turnover in baby names is stronger for girl names than boy names.We can gain a sense of this visually by observing that there is less flare to the left of the histogram for boy names relative to the histogram for girl names.
For girls, ranging from common 2018 names ('Harper' , 'Madison' , and ' Addison') down to rare names (e.g., 'Kaisa' , ' Akhari' , and 'Hadly'), the 2018-exclusive names comprise 80.4% While not separated because of the histogram's cell sizes, the 1968-exclusive-type line is dense relative to the histogram body in both Figs. 4 and 5.We find 56.7% of all girl names (4,643 of 8,195) and 36.4% of all boy names (1,726 of 4,743) are 1968-exclusive names relative to 2018.A wide range of girl names that were popular in 1968 ('Tammie' , 'Ronda' , and 'Patty') as well as rare (' Anmarie' and ' Adine') have fallen out of favor by 2018.For boys, once-common 'Bart' and 'Tod' have dropped off the ledger.We also see apparent errors along the exclusive-type line for boy names in 1968 with 'Gina' (20 counts) and ' Alicia' (9 counts).
We emphasize that the balance indicators are for baby names appearing at least fives times.For our present work, and in attempting to maintain uniformity across allotaxonographs, we do not attempt to adjust for names appearing less than 5 times, though this would be possible for the topmost balance for total counts given we have that information separately.Clearly the balance values would shift if we had complete data sets for baby names; estimating errors for these estimates would be meaningful future work.
We note that the asymmetries of both histograms-their apparent right-side 'heaviness'-are not due even in part to changes in overall numbers.Using total birth numbers (see Sect. 5.1), the total number of girl names recorded in 1968 and 2018 are comparable at 1,709,551 and 1,846,101 (7.99% increase); for boys, these numbers are 1,775,997 and 1,928,871 (8.61% increase).The number of year-exclusive names in the 1968 and 2018 are strikingly different however: 8,195 and 18,115 for girls (121% increase), and 4,743 and 14,081 for boys (197% increase).Two of the likely major factors which have lead to this explosion in name-space are immigration and a cultural shift towards parents creating novel names.
Using the overall birth numbers, we can also estimate the percentage of names absent from our dataset-those with less than 5 instances: 4.05% for 1968 and 8.08% for 2018 for girls, and 2.11% for 1968 and 6.07% for 2018 for boys.The 2018 size-rank distributions thus have heavier tails pointing once again to strong innovation.
The turnover in girl names results in a high rank-turbulence divergence value of D R ∞ (R 1 R 2 ) = 0.926.For the same time frame comparison, boy names have a lesser but still high value of D R ∞ (R 1 R 2 ) = 0.850.Both values are below but not far from the randomized equivalents with size-rank distributions held constant (as described in Sect.3.3 for the BCI case): D R ∞;rand (R 1 R 2 ) = 0.973 and 0.966.We turn to the overall orderings of δD R ∞,τ contributions for girls and boys, the ordered lists of Figs. 4 and 5.
In general, in the limit of α = ∞, the contribution ordering will be an interleaving of types from both distributions.The ordering of types on each side of the list will match those of the separate size-rank distributions with the exception that all types that do not change rank will be absent.The interleaving is generally a simple back and forth sequence between the two systems but breaks whenever a rank is reached that is the lowest rank (largest value of r) for a specific type.
For girls in 1968 relative to 2018, we see the three medal places go to 'Lisa' , 'Michelle' , and 'Kimberly' .In fourth, we have 'Jennifer' , a name that would go on to be the most popular girl name in the US throughout the entire 1970s.In fifth is the once dominant 'Mary' which had held the number one position from 1880 almost entirely through to 1961 ('Mary' was second to 'Linda' for the years [1947][1948][1949][1950][1951][1952]. The dominance of the most popular girl name in 1968, 'Lisa' , relative to 2018 is remarkable, carrying the top overall 1968 δD R ∞,τ contribution for all values of α.In Flipbook S7, we see that in dropping from r = 1 to r = 888, 'Lisa' is second in contribution for both 1968 and 2018 only for α = 0 (first page) when we see 'Harper' take the top position.At this limit, order is by rank ratio and the above-the-rim elevation for 'Harper' from r = 15,437 to r = 9 is more than enough for the win.
On the other side, for 2018 relative to 1968, 'Emma' is the new 'Lisa' , with 'Olivia' and ' Ava' in second and third for δD R ∞,τ contribution.In dialing α, Flipbook S7 shows that like 'Lisa' , 'Emma' prevails above all other names except 'Harper' when α = 0.
Of special note is the name 'Elizabeth' which stands out on the rank-rank histogram, well isolated in the upper triangle.We see that of all the top girl names in 1968, 'Elizabeth' alone has held its popularity.Flipbook S5, further shows that 'Elizabeth' maintains this isolated stability over decades.No standard divergence measure will highlight 'Elizabeth' , inviting the development of a different class of measures that find anomalous rank-rank pairs.
While not to the degree of 'Elizabeth' , there are two boy names that occupy a small hollowed-out region of rank-rank space in the histogram of Fig. 5: 'James' (steady at r = 4) and 'William' (up from r = 6 to r = 3).As 'Liam' is an Irish variant on 'William' , the latter effectively held the 1st and 3rd position in 2018.
For girl names compared with the α set to 0, the first page of Flipbook S5 shows that 1968 and 2018-exclusive names dominate the overall list.While 'Lisa' remains at the top, we then have 'Tammy' , 'Michele' , 'Rhonda' , 'Michelle' and 'Tammie' as the 6 names from 1968 in the top 40 for δD R 0,τ contributions.After 'Harper' , the top 2018 names are 'Madison' , 'Isabella' , 'Luna' , and 'Layla' .
Our allotaxonomic instrument also has the ability to uncover subsets of related types behaving in similar ways.For example, when tuning to α = 0 (Flipbook S5), we see a raft of 2018-exclusive boy names ending in '-aden' , '-aiden' , and '-ayden' .These small interrogations of the data lead to larger questions which are beyond the scope of our work here.Are girl and boy names differently diverse?And how has the phonetic spread of names changed over time?A complete analysis could be performed by matching and grouping names based on spelling, syllables, and known variations.
To close out our study of baby names, we add two more allotaxonographs whose primary purpose is to show how our instrument performs when system sizes differ strongly.In Figs. 6 and 7, we compare US baby girl and boy names in 1880 and 2020, a 140 year gap.
We make some observations about balances, the rank-turbulence divergence scores, the rank-rank histograms, and the changes in naming from 1880 to 2020.
For the preceding allotaxonographs (Figs.1-5), the largest difference for system sizes has been for Twitter in Fig. 1.The date 2016/11/09 carried 59.9% of all tweets from the Figure 6 Allotaxonograph comparing US baby girl names for the years 1880 and 2020.This figure is in part a demonstration of how allotaxonographs competently perform when the sizes of two systems differ strongly.For 1968 and 2018 in Fig. 4, the balance of total baby girl names is an almost even 49.2% and 50.8%.By contrast, for 1880 and 2020, these percentages are 5.4% versus 94.6%.The choice of α = ∞ again means that the top names for each year will dominate D R ∞ , regardless of their rank in the comparison year (unless a name has equal rank in both years) Figure 7 Allotaxonograph comparing US baby boy names for the years 1880 and 2020, companion to Fig. 6.The year 1880 has 6.0% of the total baby boy names of both years combined, while 2020 has 94.0%.For 1968 and 2018 in Fig. 5, the equivalent numbers are 49.0%and 51.0% two dates combined, with the other 41.1% on 2017/08/18 (top balance bar, bottom right of the histogram).
By contrast, of the total number of baby girls born in 1880 and 2020, the years separately account for 5.4% and 94.6% respectively, a factor of roughly 17-fold (Fig. 6).For boys, these weights are similar at 6.0% and 94.0%, around 16-fold (Fig. 7).
Because of the large increase in registered babies being born, the two kinds of type balances are consequently more extreme.For example, of the combined types for the distinct baby girl names in 1880 and 2020, only 5.3% were used in 1880, while 98.7% were used in 2020.For exclusive types, 25.3% of 1880's distinct baby girl names appeared only in 1880, while 96.0% of 2020's were not used in 1880.
For baby girl names, the value of D R ∞ = 9.31 for this 140 year comparison is slightly higher than that for the 50 year gap between 1968 and 2018, D R ∞ = 9.26, while for boys the increase is from D R ∞ = 0.850 to 0.900.In general, the rank-rank histograms of these disparately sized systems will show a strongly separated, highly dense line corresponding to exclusive types on the side of the larger system.For both baby girl and boy names in 1880 and 2020, the separated line is around an order of magnitude from the main body of the histogram, and the component cells are high count ones.While this separation could occur for equal-sized systems if the type counts differ enough, the count density of the separated line will not be as strong.With familiarity, a glance at the balance bars will clarify these details.
Rank-turbulence divergence with α = ∞ is a function only of the highest rank for each type (Eq.11).As such, the main contributions for girls come from 'Mary' (r = 1 to 123) 'Olivia' (r = 234.5 to 1), while for boys the leaders are 'John' (r = 1 to 27) and 'Liam' (r = 7643.5 to 1, not used in 1880).
As we have reiterated, for evolving complex systems, allotaxonographs can help lead us to examine time series for individual types that occupy interesting locations in the rank-rank histogram.For baby girl names, 'Emma' stands out as a name that was enormously popular in both 1880 (r = 3) and 2020 (r = 2).But the story for Emma proves to be akin to that of Vonnegut's man-in-a-hole's emotional arc [65][66][67].Ranked third in 1880, 'Emma' dropped at a gradually increasing rate over the next 90 years to a stable set of low ranks in the 1970s-the decade of 'Jennifer'-bottoming out at r = 463 in 1976.After first starting to revive in 1983, 'Emma' rapidly rose back to 4th in 2002 and stayed in the top 3 from 2003-2021, six times atop with r = 1.

Allotaxonometry of publicly traded US companies: stability, shocks, and errors
In Fig. 8, we show the rank-turbulence divergence graph comparing US company by market caps in the final quarter of 2007 with the final quarter of 2018 (for dataset description, see Sect.5.1).The allotaxonograph is a blend of the two limiting cases of stability and change: The vertical line of matching systems and the 'vee' of disjoint systems (Figs.1B  and 1D).We choose α = 1/3 for the rank-turbulence divergence instrument as the ordering of δD R 1/3,τ values presents a mixture of high to low market cap (see below for more on this choice).In Flipbook S9, we show allotaxonographs for market cap comparisons for 6 year time gaps starting in 1995 and moving through to 2012.
Of the companies which both existed and reported market cap in both 2007 and 2018, we see a great deal of durability to their rankings.Somewhat more than what we see for species abundance numbers in Sect.3.3, there are some notable movements in ranks.At the top of the rank-losing side of δD R 1/3,τ list we see General Electric (r = 2 → 78), Exxon The rank-rank histogram is a hybrid of a vertical structure we see for relatively stable systems (Fig. 1B), and a 'vee' of disjoint systems (Fig. 1D).The disjoint feature results from sharp transitions as companies fail, merge with or are acquired by others, or go public or return to private, but also from missing or erroneous data.Berkshire Hathaway's market cap, for example, was misrecorded as a thousand fold drop.We include the incorrect rankings for Berkshire Hathaway and DowDuPont Inc to help show how an allotaxonometric analysis can sharply reveal dataset problems.The corrected allotaxonograph follows in Fig. 9. See Sect.3.6 for discussion, and Sect.5.1 for dataset details Mobil (1 → 9), and AT&T (4→19).Berkshire Hathaway's apparent drop stems from a dataset error which we discuss below.On the right side for companies in existence in both 2007 and 2018, technology companies dominate: Amazon (r = 86 → 3), Apple (11 → 2), Microsoft (3 → 1), and Netflix (1214 → 42).
Companies along the exclusive lines of the disjoint system 'vee' disappear and appear for a range of reasons.Mergers and acquisitions, companies being taken from public to private and vice versa, and outright failure all contribute to market cap comparisons having a disjoint aspect.
Looking through the 2007 exclusive companies on the histogram and the list (as indicated by the left triangle prefix), we see many companies that were acquired, with a few examples being Wachovia (bought by Wells Fargo in 2008), Genentech (bought by Roche in 2009), Time Warner (bought by Charter Communications in 2016), and Monsanto (bought by Bayer, 2018).We also find a few companies that failed with Lehman Brothers being a famous (or infamous) example from the 2007-2008 global financial crisis.
On the 2018 side, Visa and Facebook are the standout entrants.With respective initial public offerings (IPOs) in 2008 and 2012, we find they rank at r = 5 and 8 at the end of 2018.Visa's competitor Mastercard was already publicly traded in 2007, and ranks highly as well for α = 1/3 (r = 1214 → 24).AbbVie, Abbot Laboratories in 2013 ranks highest for pharmaceutical companies.The brewing company Anheuser-Busch InBev SA/NV formed in 2008 when Belgium's InBev purchased Anheuser-Busch.
The dataset for market caps does have some missing and erroneous data.DowDuPont's market cap for the last quarter of 2018 is absent and is consequently shown to have plummeted from a rank of r = 91 in 2007 to equal-to-last in 2018.Berkshire Hathaway's market cap is clearly misrecorded for the last three quarters of the dataset (apparently dropping from $528.33B to $0.34B at the end of 2018).
We take the opportunity to perform a small test of the sensitivity of rank-turbulence divergence by correcting the data for these two companies.For DowDuPont, with further sourcing, we find the year-end 2007 and 2018 market caps were reported as $37.06B and $121.34B, and for Berkshire Hathaway, $149.56B and $502.37B.Upon making these corrections, we first find again that D R 1/3 = 0.411, unchanged to three decimal places.In the corrected allotaxonograph (Fig. 9), Berkshire Hathaway's location shifts to the right side of the histogram (r = 38 → 5) and is now listed as the 7th overall strongest contribution for D R 1/3 .DowDuPont no longer makes the top 40 of the list of contributions.While these two changes are dramatic, the remainder of the allotaxonograph remains essentially identical.
We have chosen to leave such errors in Fig. 8 to help (again) demonstrate the importance of using a rich, graphical allotaxonometric instrument.With a naive measurement of divergence, we would easily miss problematic data points.Evidently, and beyond our present paper's interests, for any further investigations, these two errors suggest that considerable effort should be made to further clean the market cap dataset More generally, the specific form of the market cap histogram in Fig. 8 shows how we must take care when measuring divergences of any kind.The histogram's structure is not as simple as those for Twitter, species abundance, and baby names, and it would be problematic to allow for an unexamined, automated fitting of α for rank-turbulence divergence (or parameters of any other divergence).Given the composite form of the allotaxonograph for market caps, an alternative treatment would be to separate out companies that appear in both systems from those companies that appear in only one year, the exclusive types.The enduring companies could be analyzed as a low-turbulent system on its own, and the companies exiting and entering as a disjoint system.A rank-based divergence instrument could be constructed that achieves this automatically, possibly returning a set of measurements that would capture that stable-shock balance we so clearly observe.Handling mergers, acquisitions, and partitionings of companies is also plausible and would require other kinds of elaboration of rank-turbulence divergence.

Truncation effects for rank-based allotaxonographs
Truncation of a system's size-rank distributions is a common if often overlooked problem [33,68].datasets may be curtailed for many reasons such as fundamental or cost-imposed measurement limits, data storage constraints, and privacy.Text corpora generate especially heavy-tailed distributions, with hapax legomena taking up roughly half of a text's lexicon [14].The Google Books n-gram corpus only includes n-grams which have appeared 40 or more times [69], excluding a vast number of rare n-grams.In our present work, we have already seen that for Twitter, our sample is approximately 10% of all tweets (with Twitter itself being a rather small subsample of all forms of human expression), and that baby names with counts of 4 or less are not made public for any censused population within the US.Limits to sampling in ecological systems can be severe-the Barro Colorado Island data is evidently not inclusive of all plant matter.
To investigate the problem of truncation, we explore our four case studies of Twitter, tree species, names, and companies by systematically limiting the observable components of each system.For each pair of systems, we take the top N = 10 k ranked components where k = 1.5, 2.0, 2.5, . . ., stopping once we exceed the size of both systems.For each k, we generate the corresponding series of rank-turbulence divergence graphs, producing Flipbooks S10-S14.For a visual summary of these Flipbooks, we combine a subset of the (bare) rank-rank histograms to form Fig. 10.
The five rows of Fig. 10 correspond to our four case studies, with baby names contributing two rows.The first two examples of Twitter and tree species show a regular trend towards the full histogram.By contrast, baby names and market caps both appear to be disjoint when strong truncation is applied (small N ).As N increases, the internal ran-Figure 10 Exploration of the effect of subsampling data for allotaxonometric analyses.The rows correspond to the four case studies of Twitter, trees, baby names, and market caps (see Figs. 2-8).Each row shows abstracted rank-rank histograms for size-rank distribution truncations to the top N types, along with rank-turbulence divergence scores for the indicated values of α.For corresponding, complete allotaxonometric analyses, see Flipbooks S10-S14.All sequences follow steps of half an order of magnitude in the truncation number N. As N increases, the Twitter and tree species histograms are revealed in a clean fashion, while baby names and market caps begin with a disjoint system 'vee' that masks their large N forms.The paths of convergence towards the divergence score vary and may be uneven if usually monotonic, depending on the systems being compared and the choice of α.Rows extend to above the maximum system size for each comparison, and all colormaps and limits correspond to those used for the four case studies dom structure for baby names and the stable vertical structure for market caps start to be revealed by N = 1000.
In general, as N is increased, we see the main stories and patterns emerge.For Twitter, the election's imprint is clear for low N (Flipbook S10) with the texture of Charlottesville requiring more words to be included.The most dramatic changes in the lists of rankturbulence divergence occur for baby names and market caps, as the system-exclusive types of these comparisons are masked for low N .
As a rough rule of thumb, the appearance of separated system-exclusive lines suggests that the underlying datasets are sufficiently rich enough to allow for a substantive allotaxonometric comparison.For the example of Twitter, and understanding that cell size matters, we see the separation occurs when N is moved from 100,000 to 1,000,000.We see no such separation for tree species however the vertical form representing stability unveils itself with increasing N in clear fashion.
We see that the values of D R α for the truncation sequences approach the 'true' value in largely monotonic, if different, ways.For the Twitter study, the value of D R 1/3 is approached from below, deceptively exhibiting a flat section up to N = 10 6 .The ecology example starts above and moves down towards the overall score of D R 0 = 0.077.Baby names and markets caps similarly both start above their respective overall scores for D R ∞ and D R 1/3 , and move downwards, though their scores for strong truncation are close to one as they appear to be disjoint systems.While the baby name scores drop slowly and not far (0.993 and 0.881 for 31 names for girls and boys down to 0.926 and 0.850 for all names), the market cap study only starts to gain more than the 'vee' shape when N is into the thousands.Because the market cap data comparison is a blend of large-scale turnovers around a relatively stable core, the drop is slow and then fast and further (0.931 for 31 corporations down to 0.441 for all corporations).
Our work aside, we expect any divergence measure will likely vary as orders of magnitude more data is included.And we add that in certain circumstances, choosing to truncate a data set may be a well justified treatment of data.
Finally, we note that while some form of truncation is a common measurement issue with real data for complex systems with many components, it is certainly not the only one.Exploring how other kinds of measurement errors affect rank-turbulence divergence would be a natural area of future work.

Guide to flipbooks
To help demonstrate rank-turbulence divergence as an allotaxonometric instrument, we have referenced a number of Flipbooks throughout the paper.We include these and other Flipbooks as supplementary information which can be found as part of our paper's online appendices at http://compstorylab.org/allotaxonometry/flipbooks.
Flipbooks are intended to be 'flipped through' back and forth using a PDF reader with the view set to 'single page' rather than continuous.
We list and briefly describe all Flipbooks here.Our flipbooks follow various formats which include: Comparisons of two systems with varying rank-turbulence divergence parameter α; Comparisons of a series of system pairs, often through time; and Comparisons of systems with truncation applied (Sect.3.6).the deeply influence of scientific literature and individual books in Ref. [70], rendering the Google Books project unreliable, as is.Nevertheless, the Version 2 n-grams dataset for English fiction is worth exploring [27] with different instruments, and we are endeavoring separately to provide corrective measures.For 1948, we see characters and place names dominate, and these come from a few books (e.g., 'Lanny Budd' , 'Raintree County').The 1987 side shows words that are not tied to specific books but rather cultural and temporal phenomena, as well as cruder language: 'KGB' , 'CIA' , 'Vietnam' , 'lesbian' , 'television' , 'computer' , and 'fucking' .Tuning α towards ∞, we can see pronouns changing slightly in rank with 'her and 'she' elevating and 'he' and 'his' dropping.
Flipbook S18-Google Books, Fiction in 1948 versus 1987, 3-grams: For 3-grams, while we still see characters and place names for 1947, we now have what we call 'pathological hapax legomena' , words (or trigrams in this case) that occur once in many books.The 3-grams are all from standardized, legal-speak front matter coming from outside of the story: 'change without notice' , 'your local bookstore' , and 'Cover art by' .A second kind of trigram that dominates appears to be one that appears as part of a book's title printed on every page in the header or footer.As we increase α, we again see 'not' appearing in contributing 1987 trigrams.Because of the combinatorial explosion around words like 'computer' and 'phone' , we no longer see them in the trigram lists.One upshot of this brief inspection of Google Books is to highlight the value of separately examining n-grams.We also note that the 3-gram example is our largest system-system comparison with system sizes on the order of 10 9 .
Flipbook S19-Harry Potter books, all 1-grams: Comparison of each Harry Potter book relative to all other books in the series combined, using α = 1/2 (the single book is the right hand system, the merged set of 6 books the left system).Character names and major objects and places dominate, and the first book is most different from the others combined.
Flipbook S20-Harry Potter books, uncapitalized 1-grams: The same comparison as the previous Flipbook but now with all capitalized words excluded, as an example attempt to use a different lens on our allotaxonometer.Hagrid's speech patterns in part separates Book 1 ('yer' , 'ter'), Book 3 has 'rat' , 'dementor' , and a relative abundance of em dashes ('-'), Book 7 has 'sword' , 'wand' , and 'goblin' .The dominant elements are things, places, and repeated actions (e.g., spells) and descriptors.To examine changes in functional word usage, which may reveal changes in Rowling's writing, we would increase α as we did for Google Books.Again, we see the relative ease of taking subsets with ranks for allotaxonometry.
Flipbook S21-Causes of Death in Hong Kong: Five year gap comparison of causes of death reported per year in Hong Kong, starting with 2001 versus 2006 and moving through to 2012 versus 2017.Overall, pneumonia is the leading cause of death.In the second half of the time frame, 'kidney disease' and 'dementia' stand out as becoming more prevalent.Deaths listed as due to heroin drop off markedly in 2012 and 2013 relative to 5 years before.We note that changes in diagnoses, practices, and categorization are all confounding issues.

Datasets
Word usage on Twitter: Derived from an approximate 10% sample of Twitter collective by the Computational Story Lab from 2008 to 2020; English language detection performed per Ref. [45].
Species abundance on Barro Colorado Island: The dataset and its online repository for censuses taken over 35 years are described in Ref. [56].
Baby names: Data taken from Social Security Card applications as made public in 2022.(We caution that historical counts in this data set do change with each new release of baby name counts.)For each year from 1880-2021, the dataset includes all names which have 5 or more applications.Because Social Security Numbers were first issued at the end of 1936, there is a change in the dataset's nature as people moved from registering as adults to being solely registered at birth.While we use the dataset as is here, we note that there is a clear change in the male to female ratio with more boys being registered from 1940 onwards.Baby name dataset available here: https://catalog.data.gov/dataset?tags=baby-names.Separate dataset for total births available here: https://www.ssa.gov/oact/babynames/numberUSbirths.html.
Market cap data: The underlying dataset comprises 9322 US publicly traded companies that have been part of the S&P 500 at any point during the period of 1979-2018, or part of the Russell 3000 index from 1995 on.Data is available from Siblis Research here: http://siblisresearch.com/data/us-equity-returns/.
Google Books n-grams: Version 2, English Fiction.We filtered the database to collect only n-grams containing simple latin characters.Dataset available here [69]: https://books.google.com/ngrams.
Job titles: Provided by Burning Glass, the dataset is derived from online postings (several million job openings per day, tens of thousands of sources).Raw listings are processed and categorized into two smaller taxonomies with natural-language algorithms.
For the present paper, we wrote the scripts to generate the allotaxonographs in MATLAB (Laboratory of the Matrix).We produced all figures and flipbooks using MATLAB Versions R2019b, R2020a, and R2021a.The core script is highly configurable and can be used to create a range of allotaxonographs as well as simple unlabeled rank-rank histograms.Instruments accommodated by the script include rank-turbulence divergence, probabilityturbulence divergence [43], and generalized symmetric entropy divergence which includes Jensen-Shannon divergence as a special case.The measurement and visualization of detailed, type-level differences between complex systems.In the development of dynamic allotaxonometric dashboards, we have argued for a full embrace of complexity and stringent avoidance of falling into the trap of describing system differences solely by a single number.In Sect.1.3, we observed numerous benefits for using ranks: Widespread applicability beyond systems with type frequencies, probabilities, or rates; a natural handling of system-exclusive types by ranking them last; robustness of rank-based statistics; and the straightforward interpretability of ranked lists.
Focusing on systems with many components which can be ranked by some kind of well-defined size, we have created, tested, and explored rank-based allotaxonographs built around our conception of a tunable rank-turbulence divergence.In Table 1, we collect a list of example system comparisons with D R α ( 1 2 ) ranging from 0 to 1.At the core of rank-turbulence divergence in Eq. ( 7) is the interpretable difference of inverse powers of type ranks: As α → 0, the differences between ranks are contracted and low rank types become more salient.As α → ∞, rank discrepancies become more exacerbated, and the highest rank types dominate.Narrowing our view to systems which afford frequencies of components, we find our directly tunable divergence appears to be far more general than many probability-based divergences, which are largely grouped around a few core structures.Per Ref. [31] and imposing the Zipf 's law ideal of p = 1/r, we see that |r -1 τ ,1r -1 τ ,2 | is an abundant form.There are a few other variations including min(r τ ,1 , r τ ,2 ), and the Hellinger-like distance |r For the instrument's integrity and power, we assert that the map and list should be bound together.While our allotaxonomic histograms give immediate stories from the automati-cally labeled words along the fringes, overall ordering of these words by some measure of importance is unclear.And in choosing to map a two-dimensional rank-rank histogram onto a single dimension-another ranked list-we remain mindful that we are discarding information.We suggest that, analogously, all cartograms would benefit from an associated ordered list and vice versa [10].
As we have stated, there is tendency across diverse fields towards creating single-number measurements of complex systems, and that this is especially problematic when heavytailed size-rank distributions are in evidence (e.g., the Gini coefficient).We have shown that even when single-number measures match for two systems, allotaxonographs using rank-turbulence divergence are able to reveal and make sense of the full variation between systems.
The four main case studies of Twitter, tree species, baby names, and companies have all provided rich and diverse examples of allotaxonometric comparisons.Our ability to readily analyze the effects of partially sampled data in Sect.3.6 further showed the value of a rank-based approach.Drawing on our paper's preprint, we and others have also used allotaxonographs in a number of other papers [80][81][82][83][84][85].
With our supplementary Flipbooks, we have attempted to show the prospect for the building of online, interactive allotaxonographs.Being linear in nature, Flipbooks allow us to explore one dimension of variation at a time, and by design are built to be fixed rather than flexible.For baby names, for example, we would like to be able to interactively vary the years being compared as well as rank-turbulence divergence's α.For temporally evolving systems, an interactive allotaxonograph could be set to track a particular cohort of types or to automatically highlight those which make a dynamical transition of some prescribed kind.
There are many future research possibilities, both theoretical and applied, suggested or opened up by what we have developed here for rank-turbulence divergence and, more generally, for allotaxonometry.

Theoretical foundation and other allometric instruments
We have been pragmatic in our construction of rank-turbulence divergence, striving to build a functional tool first and foremost.A rigorous theoretical foundation might be possible for either our tool or an adjacent rank-based divergence.Staying on the functional side, variations on our divergence might be of use for some comparisons where no value of α makes for a good fit.As we noted for the case of market caps, a composite instrument that separates stable, enduring companies to those that exit or enter could be devised.
For systems with documented component probabilities or rates, we have also constructed a related probability-turbulence divergence.We explore the allotaxonometry of this divergence in [43], showing the instrument to be a generalization of a suite of well known probability-based divergences.
As we saw for the unusually durable popular name 'Elizabeth' in Fig. 4, there are components whose locations on allotaxonographs are not highlighted by standard conceptions of divergences, rank-based or otherwise.A completely distinct measure of importance could favor largely isolated rank-rank pairings on the rank-rank histogram.Given that the measure would have to be sufficiently sophisticated to accommodate the possibility that a small cluster of related types might be near each other (e.g., 'Lady' and 'Gaga'), yet otherwise be distinct, the application of some basic kind of cluster analysis would offer a starting point.

Determination α
In our initial work, we made the choice of the tuning parameter for rank-turbulence divergence, α, a visually guided one.The user gains much from inspecting the rank-rank histogram alone, and, in our experience, is then readily able to choose an α for which the allotaxonometric contour lines best match the form of the histogram.A visually guided choice will be sufficient in cases of comparing two or a small number of systems.
When rank turbulence presents as a scaling law-which is regularly the case for text corpora (e.g., Twitter, books)-we would want to be able to determine an optimal α.While for generalized entropy approaches for single systems, the limit of linear scaling and Shannon's entropy demarcate the boundary between accentuating the common or the rare [8,34,54,55], we have found that for system comparisons, the optimal value of α, if it exists, is dependent on the pair of systems being compared-there is no universal value.
We have left open the possibility of an analytic connection between the rank-turbulence scaling described at the end of Sect.1.2, and, to the extent that well-defined scaling is present, with an optimal α for rank-turbulence divergence.
Even with an optimization method for determining α, we urge readers to always look at the visuals provided by our allotaxonographs-the maps-for confirmation of fit.

Rank energy
For another direction, we venture that a kind of 'rank energy' interpretation might be possible.Working from the idealized Zipf 's law relationship of p ∼ r -1 , we would have p α ∼ 1/r α = exp{-αE/T} = exp -E/T , (15) where E = T ln r is an energy associated with rank r and temperature T, and T an effective temperature.When T → 0, high ranked types prevail, while when T → ∞, all types move towards being weighted equally, independent of rank.

Type calculus
Identifying and quantifying change is fundamental to any form of scientific analysis (and life itself ).Allotaxonometry may be viewed as part of a larger analytic framework of 'lexical calculus' and, more generally, 'type calculus.' By lexical calculus, we mean the measurement of changes in properties of large-scale texts, and the demonstration of how individual word usage contributes to such changes through word shift graphs [86][87][88][89][90]. Expanding to complex systems comprising many types (which we would likely still denote by words), we would have a corresponding type calculus (e.g., baby names, companies, species).Simply measuring overall numeric changes in, say, entropy between two complex systems is grossly insufficient for understanding how systems may be differentially configured.We must always look at the words (or types).

Final remarks
We close with the observation that in terms of applications, any comparison of complex systems entailing a broad array of components would be fair game.A few examples would be sales of anything (e.g., Amazon's sales from week to week), crime rates, country exports, sites visited or searched for online, medical condition prevalences, rankings in sports, music popularity, and markets of all kinds.And while our focus has been on comparing sys-

Fig. 1D :
Fig. 1D: While election and Charlottesville terms dominate the sides of the histogram,

Fig. 1F :
Fig. 1F: The least important and least differentiating types appear at the bottom of the histogram.These types are low rank in both systems.The bottommost annotations of Fig. 1-'suede-denim' and 'richava'-appear once on the dates of their respective sides.These creatures of the lexical abyss are just two examples of on the order of 10 6 words

Figure 2
Figure 2 Example allotaxonograph using rank-turbulence divergence (RTD) to compare word usage on two different days of English Twitter.We explain rank-turbulence divergence allotaxonographs in full in Sect.3.1, providing a summary within this caption.To help with understanding, we again examine the same dates of the 2016 US Presidential Election and the Charlottesville Unite the Right rally as the rank-rank histogram of Fig. 1.We add rank-turbulence divergence to Fig. 1's histogram with A. A gauge for α and the expression for D R 1/3 in the upper left corner; B. An overlay of contour lines; C. A scale for the contour lines in the upper right; and D. Based on contributions of each word to D R 1/3 , an ordered list on the right by descending values of δD R

Fig. 2D :
Fig. 2D: We order the top 40 words by decreasing value of δD R 1/3,τ , as indicated by the underlying bars.We orient words to the left and right in accordance with the day of their higher rank; the bar colors of light gray and light blue match the histogram's format.Opposite each bar, we show the word's rank on each day.

Figure 4
Figure 4Allotaxonograph comparing names of girls born in the US in 1968 and 2018.Only names appearing at least five times in a year are included in the data set.For dataset details, see Sect.5.1.Of our four main case studies, baby name distributions show the strongest change with D R α ( 1 2 ) scores verging on that of the random equivalent.The asymmetry of the separated 2018-exclusive names and the balance score of 80.3% of all names in 2018 being new relative to 1968 show that while there is much social imitation (see 1970s, ' Jennifer'), baby names are highly innovative collectively.Note that at the bottom of the histogram, the annotated name is a 2018-exclusive word but it is oriented towards the left per our annotation method (with each run of the allotaxonometer script, the name is randomly chosen from all names in the specific histogram square) (see also Fig.1and Sect.2.2).See Fig.5for the boy name version.For 1968-2008, Flipbook S5 shows

Figure 5
Figure 5 Allotaxonograph comparing US baby boy names for the years 1968 and 2018.For dataset details, see Sect.5.1.The rare name at the bottom of the histogram is oriented to the left but is a 2018-exclusive word.As for girl names, we provide two Flipbooks showing 50 year gap comparisons moving through time (Flipbook S6) and the effects of varying α for the 1968-2018 comparison (Flipbook S8)

Figure 8
Figure 8Allotaxonometric comparison of publicly traded US companies in 2007 and 2018 by fourth quarter market capitalization.The rank-rank histogram is a hybrid of a vertical structure we see for relatively stable systems (Fig.1B), and a 'vee' of disjoint systems (Fig.1D).The disjoint feature results from sharp transitions as companies fail, merge with or are acquired by others, or go public or return to private, but also from missing or erroneous data.Berkshire Hathaway's market cap, for example, was misrecorded as a thousand fold drop.We include the incorrect rankings for Berkshire Hathaway and DowDuPont Inc to help show how an allotaxonometric analysis can sharply reveal dataset problems.The corrected allotaxonograph follows in Fig.9.See Sect.3.6 for discussion, and Sect.5.1 for dataset details

Figure 9
Figure 9 Allotaxonometric comparison of publicly traded US companies in 2007 and 2018 by fourth quarter market capitalization with corrections for Berkshire Hathaway and DowDuPont Inc.To be compared with Fig. 8.The 2018 market caps for both these companies were recorded incorrectly in the original data set.The revised allotaxonograph shows Berkshire Hathaway-which in fact rose to r = 5 in 2018-now appears prominently on the right hand side of the histogram and the contribution list.DowDuPont Inc's corrected rank for 2018 means that it did not contribute as strongly, and now no longer appears in the top 40 contribution list.These changes aside, the allotaxonograph is largely identical to that of Fig. 8.The rank-divergence is unchanged at D R 1/3 = 0.411
Investigating further, we find 175 names appearing 5 or more times in 2018 that are exclusive to 2018 relative to 1968 and matching the regular expression /[Aa][iy]*d+[aeoiuy]n+$/.A selection of examples ranging from common to rare, highlighting variations on Brayden, For girl names, using a similar analysis for the ending -lyn, we find 535 names exclusive to 2018, the top four of which are:

Table 1
selection of example system comparisons producing a range of D R α ( 1 2 ) values Our goal has been to propose, advocate for, and contribute to a field of allotaxonometry: