Professional tennis players accumulate score points based on their winning record during the previous year (Figure 1(A)). By ordering the players based on their score we obtain the official Association of Tennis Professionals (ATP) Rankings, \(r(t)\), an accurate measure of a tennis player’s performance relative to other players, having \(r=1\) for the top player while r is high for low-performing athletes.
We measure each player’s time-resolved visibility through the number of hourly visits to its Wikipedia page [42–44], a sensitive time-dependent measure of the collective interest in a player (Figure 1(B)). We define a player’s popularity or fame through his cumulative visibility, representing the average number of visits his Wikipedia page acquires during a year (red line in Figure 1(B)). Figure 1(C) indicates that top players can gather a total of 107 visits in a year, while those at the bottom of the rankings collect as few as 10, documenting popularity differences between players that span orders of magnitude. As Figure 1(D) indicates, a player’s rank (performance) and Wikipedia visits (popularity) are correlated: The lower the ranking, the higher are the Wikipedia page-views. Yet, we also observe a significant scattering: Athletes ranked around \(r=1\mbox{,}000\) can gather anywhere from 10 to 105 visits per year, wide differences that support the impression that popularity is often divorced from performance.
To address the degree to which performance determines popularity, we reconstructed the number of Wikipedia visits \(W(t)\) for all ranked professional male tennis players between 2008 and 2015 (Figure 1(B)), together with their time-dependent history of achievements on the field (supplementary material S3 in Additional file 1). The two datasets allow us to identify several performance-related factors that influence a player’s visibility, and eventually his popularity:
(i) Rank
\(r(t)\): Figure 2(A) shows the measured Wikipedia page-views \(W(t)\) vs. a player’s momentary rank \(r(t)\), indicating that the number of visits rapidly drop with the increasing rank of a player. It also shows that the variations in \(W(t)\) is higher for players with lower performance (i.e. higher rank) (Figure 1(B)).
(ii) Tournament value
\(V(t)\): The more points \(V(t)\) a tournament offers to the winner, a measure of the tournament prestige, the more visibility it confers to its players, win or lose (Figure 2(B)). For example, the early peaks in Djokovic’s visibility correspond to his participation in the US and Australian Opens, two high value tournaments (Figure 1(B)).
(iii) Number of matches
\(n(t)\): The more matches an athlete plays within a tournament, reflecting his advance within the competition, the more exposure he receives (Figure 2(C)). As the losing player is eliminated, \(n(t)\) also determines the amount of points the player collects within a tournament. Up to \(n(t)=5\) (where most tournaments end) we see a steady increase of \(W(t)\) with \(n(t)\). The jump for \(n(t)=6\) and 7 exist only for the semi-final and final matches in the most watched tennis tournaments, offering disproportional visibility.
(iv) Rank of the best opponent: A match against a better ranked rival generates additional interest in a player. To capture this effect we measure the relative rank difference for each match \(\Delta r(t) H(\Delta r) /r(t)\), where \(\Delta r(t)=r(t)-r_{BR}(t)\) is the difference between the rank of the considered player and his best rival in the tournament; the Heaviside step function \(H(\Delta r)\) is one when the opponent has a better rank and zero otherwise. The increase of the average page-views with \(\Delta r(t) H(\Delta r) /r(t)\) quantifies the boost in visibility from playing against a better athlete (Figure 2(D)), like the visibility peak of Djokovic when he played against the then #1 Federer in the 2009 US Open (Figure 1(B)) (see S9 for the detailed opponent statistics).
(v) Career length
\(Y(t)\): The longer the player has been an active professional player, the more Wikipedia visitations he collects (Figure 2(E)).
Figures 2(A)-(E) document clear correlations between the performance measures (i)-(v) and the momentary visibility of a player. Yet, a linear fit \(y(t)=Ax(t)+C\) of each individual performance measure to the observed Wikipedia page-views results in \(R^{2} < 0.1\), except for \(1/r(t)\), for which \(R^{2}=0.29\). Therefore, no individual performance measure can fully explain visibility, indicating that performance drives popularity through a combination of performance measures (see supplementary materials S4 in Additional file 1 for variable interdependencies). We therefore explored the predictive power of the sum of these variables with multipliers obtained via an ordinary least squares (OLS) fitting [45] process, resulting in \(R^{2}=0.31\), only slightly better than \(1/r(t)\) alone. We find, however, that a multiplicative process offers a much better predictive power, an OLS fitting leading to the formula
$$ W_{M}(t)=A \frac{1}{r(t)} V(t) n(t) e^{\frac{\Delta r(t) H(\Delta r)}{r(t)}} Y(t) + C, $$
(1)
which yields \(R^{2}=0.57\). This indicates that the influence of the performance measures (i)-(v) do not add up, but amplify each other in a multiplicative manner, allowing for the emergence of the extreme fluctuations in visibility, as observed in Figure 1(B) and Figures 2(A)-(E). By taking the logarithm of (1) and calculating the standardized β-coefficients for each term, we can evaluate how strongly each performance variable influences \(W(t)\) in units of standard deviation. Figure 2(F) shows the obtained standardized β-coefficients, indicating that rank is the strongest driving force of visibility, followed by the value (prestige) of the tournaments, the rank of the opponents and the number of matches the player participated in a tournament. While career length contributes to a lesser degree, all terms are significant (\(p<0.001\)).
These results lead to a popularity model (PROMO), which predicts the time dependent visibility of a player \(W_{M}(t)\) using the athlete’s performance as an input,
$$ W_{M}(t)=A \frac{Y(t)}{r(t)} V(t)n(t) e^{\Delta r(t) H(\Delta r)/r(t)} + C \frac{Y(t)}{r(t)}. $$
(2)
The last term accounts for periods when the player is not playing (between tournaments, periods of injury, etc.). An OLS fitting process of this two-parameter PROMO results in \(R^{2}=0.60\), offering the best predictive accuracy of the models we tested (an analysis of all sub-models is provided in the supplementary materials S5 in Additional file 1). Hence (2) represents our main result, linking an athlete’s visibility (\(W(t)\)) to his performance captured by \(r(t)\), \(V(t)\), \(n(t)\), \(\Delta r(t) H(\Delta r)/r(t)\) and \(Y(t)\). While it is possible to improve the accuracy further by implementing additional measures such as including a multiplier to \(n(t)\) for when \(n(t) \ge6 \), the increase in accuracy is not sufficient to justify the added complexity.
We designed several tests to validate PROMO’s predictive power:
(i) In a prospective study we fit the r.h.s. of (2) to \(W(t)\) for 2008 and 2009, obtaining the coefficients \(A=3.747\) and \(C=7\mbox{,}929\), the same for all athletes (see supplementary materials S3 in Additional file 1). We then use these parameters to predict the momentary visitations \(W_{M}(t)\) for the subsequent 5 years (Figure 3(A)). We find that the model accurately predicts the bulk of the real data (Pearson correlation coefficient is \(r= 0.77\) resulting in an \(R^{2}\) of 0.6). The deviation for very low Wikipedia page-views has a simple technical reason: beginning athletes often lack a dedicated Wikipedia page, hence their page-views are counted only indirectly and incompletely on Wikipedia (see supplementary materials S3 and S7 in Additional file 1). Once a player gets a Wikipedia page, his visibility approaches the model-predicted values.
(ii) Figure 3(B) compares the time dependent visibility \(W_{M}(t)\) to the real visitation \(W(t)\) for Novak Djokovic, indicating that the model (2) accurately captures not only the explosion of his overall popularity in 2011 following his exceptional performance on the field, but also the peaks in the visitation patterns. Comparing Figure 3(B) to Figure 1(B) helps clarify the lack of significant increase in popularity when Djokovic first reached #2 in the rankings: Rankings alone cannot explain visibility; we need the full predictive power of PROMO to unveil the underlying dynamics.
(iii) The model (2) allows us to separate the role of the different parameters: Rank and career length act as slow modes, driving the popularity of an athlete, capturing his overall fame or celebrity (Figure 2(D)). In contrast the tournament value \(V(t)\), the number of matches played within a tournament \(n(t)\), and the rank of the best opponent represent fast modes that drive the momentary visibility, like the timing and the height of the individual visibility peaks (Figure 3(D)). As the slow and fast modes get multiplied, they together can account for the major visibility peaks on a slowly varying background, which in turn determines an athlete’s overall fame.
(iv) To assess our ability to predict a player’s popularity or fame from his performance, we use the total page-views a player received across several years. Figure 3(C) shows a comparison between the predicted and observed popularity of each active player between 2008 and 2015, indicating that the observed popularity \(W(t)\) closely follows the performance-driven predicted popularity \(W_{M}(t)\). A color-coding by the player’s peak rank reveals that for players that reached top rankings, the accuracy of the prediction is remarkable; scattering is only seen for lower ranked players.
To understand if popularity in tennis can be induced by factors unrelated to performance, we inspected the outliers, athletes whose observed popularity is significantly higher than their performance-based popularity. We find that the outliers highlighted in Figure 3(C) (also listed in Table S3) are young players at the bottom of the rankings, who participated in only a few tournaments. An inspection of their career reveals that their added popularity is also performance driven, routed in outstanding results in junior or doubles tournaments, performance factors not considered in (2). For example Quinzi, Peliwo, Zverev and Golding reached number one or two in junior rankings, and Broady and Edmund had considerable successes in doubles. Only Marco Djokovic’s visibility could not be explained by his performance. His higher than expected fame is most likely related to the attention (and potential confusion) he earns as the brother of Novak Djokovic, one of the best active tennis players.
(v) Finally, we find that the Wikipedia pages of retired players continue to attract visitors (Figure 1(C)), prompting us to ask if a player’s enduring popularity can be explained by his past performance. Given that for retired players \(r(t)\), \(V(t)\), \(n(t)\) and \(\Delta r(t) H(\Delta r)/r(t)\) are not recorded, their visibility can be determined only by the second term in (2). By using the median for \(r(t)\), reflecting a player’s overall performance during his active career, and \(Y_{T}\) for the number of active years \(Y(t)\), we predict an athlete’s popularity during retirement as
$$ W^{\mathrm{Inact}}_{M}(t)=C \frac{Y_{T}}{r_{\mathrm{med}}}, $$
(3)
using the same fitting parameter we had before (\(C=7\mbox{,}929\)). As Figure 4(A) shows, the predicted popularity of inactive players is in excellent agreement with the measured popularity, indicating that past performance is the main source of their enduring fame, at least for several years following the player’s retirement (see supplementary materials S8 in Additional file 1 for outliers).
Finally we can apply the tools and insights developed above to explore the emergence of fame. For this we group players into ten categories based on their peak ranking during their career. The change in the average ranking of each group over time (Figures 4(B)-(C)) indicates that players who reach the best ranks distinguish themselves in their first 20 tournaments, i.e. they climb in rank very fast early in their careers. This effect is particularly clear in Figure 4(C), which shows the rate of change in rankings during the first 20 tournaments, indicating that players that eventually reach the top rise much faster early on than the rest. Therefore, top ranks are not reached by a slow improvement in skill, but instead young players come in with a given skill set, some remarkable, others less so, and rapidly reach the vicinity of their skill-determined ranking level, where they fluctuate for most of their career (Figure 4(B)).
Figure 4(D) shows the projected daily average Wikipedia visitation of players grouped based on their best rank (supplementary materials S9 in Additional file 1). It allows us to uncover a highly nonlinear relationship between rank and popularity (Figure 4(E)): popularity raises fast immediately following a player’s entry into the professional field, but this growth rate slows down between ranks 1,000 and 400. This is followed by an exploding popularity for the elite players, those that reach ranking 200 and below. In other words, elite players benefit from a disproportional popularity bonus, not accessible for other ranked players. Overall, we find a robust relationship between a player’s average popularity and rank: At the beginning of a player’s career popularity increases as \(1/\langle r\rangle^{\alpha}\), with \(\alpha\approx2.0\pm 0.02\), indicating that in this early stages of an athlete’s career rank is the most important determinant of popularity (Figure 4(F)). After the first 100 tournaments, however, the influence of rank on popularity is less pronounced and average popularity fluctuates in the vicinity of its top value.