Skip to main content
Figure 8 | EPJ Data Science

Figure 8

From: The growing amplification of social media: measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009–2020

Figure 8

Language Zipf distributions. (A) Zipf distribution [143] of all languages captured by FastText-LID model. (B) Zipf distribution for languages captured by Twitter-LID algorithm(s). The vertical axis in both panels reports rate of usage of all messages \(p_{t,\ell }\) between 2014 and 2019, while the horizontal axis shows the corresponding rank of each language. FastText-LID recorded a total of 173 unique languages throughout that period. On the other hand, Twittert-LID captured a total of 73 unique languages throughout that same period, some of which were experimental and no longer available in recent years. (C) Joint distribution of all recorded languages. Languages located near the vertical dashed gray line signify agreement between FastText-LID and Twitter-LID, specifically that they captured a similar number of messages between 2014 and end of 2019. Languages found left of this line are more prominent using the FastText-LID model, whereas languages right of the line are identified more frequently by Twitter-LID model. Languages found within the light-blue area are only detectable by one classifier but not the other where FastText-LID is colored in blue and Twitter is colored in red. The color of the points highlights the normalized ratio difference \(\delta D_{\ell }\) (i.e., divergence) between the two classifiers, where \(\mathcal{C}^{\mathrm{F}}_{\ell }\) is the number of messages captured by FastText-LID for language , and \(\mathcal{C}^{\mathrm{T}}_{\ell }\) is the number of messages captured by Twitter-LID for language . Hence, points with darker colors indicate greater divergence between the two classifiers. A lookup table for language labels can be found in the Table 1, and an online appendix of all languages is also available here: http://compstorylab.org/storywrangler/papers/tlid/files/fasttext_twitter_timeseries.html

Back to article page