Skip to main content

Table 1 Contributions and scores of various text comparison measures according to the word shift framework. The word contribution \(\delta \Phi _{\tau }\) indicates how an individual word impacts a measure, and each contribution is expressed as a difference in weighted averages so that it can be easily identified with the components of the word shift framework

From: Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts

Measure

Notation

Word contribution \(\delta \Phi _{\tau }= p_{\tau }^{(2)}\phi _{\tau }^{(2)} - p_{\tau }^{(1)} \phi _{\tau }^{(1)}\)

Relative Frequency

\(P^{(i)}\)

\(p_{\tau }^{(2)} - p_{\tau }^{(1)}\)

Shannon Entropy

\(H (P^{(i)} )\)

\(- p_{\tau }^{(2)} \log p_{\tau }^{(2)} + p_{\tau }^{(1)} \log p_{\tau }^{(1)}\)

Generalized Entropy

\(H_{\alpha } (P^{(i)} )\)

\(-p_{\tau }^{(2)} [ \frac{ (p_{\tau }^{(2)} )^{\alpha -1}}{\alpha -1} ] + p_{\tau }^{(1)} [ \frac{ (p_{\tau }^{(1)} )^{\alpha -1}}{\alpha -1} ]\)

Kullback–Leibler Divergence

\(D^{(\text{KL})} (P^{(2)} \parallel P^{(1)} )\)

\(- p_{\tau }^{(2)} \log p_{\tau }^{(1)} + p_{\tau }^{(2)} \log p_{\tau }^{(2)}\)

Jensen–Shannon Divergence

\(D^{(\text{JS})} (P^{(1)} \parallel P^{(2)} )\)

\(p_{\tau }^{(2)} \pi _{2} \log \frac{p_{\tau }^{(2)}}{m_{\tau }} - p_{\tau }^{(1)} \pi _{1} \log \frac{m_{\tau }}{p_{\tau }^{(1)}}\)

Generalized Jensen–Shannon Divergence

\(D_{\alpha }^{(\text{JS})} (P^{(1)} \parallel P^{(2)} )\)

\(p_{\tau }^{(2)} \pi _{2} [ \frac{ (p_{\tau }^{(2)} )^{\alpha -1} - m_{\tau }^{\alpha -1}}{\alpha -1} ] - p_{\tau }^{(1)} \pi _{1} [\frac{m_{\tau }^{\alpha -1} - (p_{\tau }^{(1)} )^{\alpha -1}}{\alpha -1} ]\)