Skip to main content
Figure 3 | EPJ Data Science

Figure 3

From: Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts

Figure 3

(Left) Word shift graph of the sentiment difference between the first and second halves of Moby Dick by Herman Melville. A naive application of a dictionary-based sentiment lexicon for the two-segment emotional arc would inflate the negative trajectory of the novel without the preprocessing or removal of words like ‘cried’ and ‘cry’, which more often means ‘said’ in nineteenth century English, and ‘coffin’, which is used as a surname about one third of the time. We find that \(\delta \Phi = 0.09\) when including those words, while \(\delta \Phi = 0.07\) when they are excluded. (Right) Word shift graph of the sentiment difference between the in-park and out-of-park tweets across 25 cities in the US. A naive application of a dictionary-based sentiment lexicon would inflate the in-park tweet scores by including words like ‘park’, ‘beach’, ‘zoo’, ‘museum’, ‘music’, and ‘festival’, all of which represent physical locations and events within parks. We find that \(\delta \Phi = 0.12\) when including those words, while \(\delta \Phi = 0.10\) when they are excluded. For both word shift graphs, a reference value of 5 was used, and a stop lens was applied on all words with a sentiment score between 4 and 6. Both word shift graphs contain cumulative contribution and text size diagnostic plots in their bottom left and right corners, respectively. See the following case study and the Materials and methods for more details on their interpretation

Back to article page