How does the scientific interest of researchers change across time? To provide answers to this question let us first measure the similarity of scientific production at different careers stages. For simplicity, we consider the first (*f*) and the last (*l*) year of activity in our dataset. Then, for each career stage *S*, \(S\in [f,l]\), and author *i* we build a vector \(\mathbf{x}^{i,S}\) of size equal to the number of PACS at the classification level under consideration, i.e., 10 at the first and 68 at the second level, etc. The vectors are constructed so that the generic component, \(x^{i,S,}_{\alpha }\), describes the fraction between the number of times the PACS *α* has been used and the total number of PACS adopted. To better understand these vectors, consider an author *i* that, in the last year of her activity, wrote three papers using a set of five unique PACS. Now assume that one PACS, say *α*, has been used in all three papers. The component *α* in the vector will be \(x^{i,l}_{\alpha}=3/5\). Thus, the components quantify the share of interest, in a specific year, towards the various PACS. In order to determine the similarity between vectors we use the cosine similarity, \(\theta =\cos (\gamma )=\frac{ \mathbf{A} \cdot \mathbf{B}}{ \Vert A \Vert _{2} \Vert B \Vert _{2}}\), defined for each pair of vectors **A** and **B**. To start getting a feeling about the distribution of the similarities, we first consider all authors that published their first papers in 1980 and compare the first year of publication with their last, using the 68 second level PACS. As it is clearly seen in Fig. 2A, two tendencies are followed by the largest number of authors: \(\theta >0.9\) and \(\theta <0.1\). Thus, authors were more likely to keep working in the same topics potentially exploring few others, or instead change almost completely the subject of investigation. It is important to notice how the tendency towards a substantial change in research interests is embraced by a higher number of authors while the second, third and forth more likely values are concentred for high values of *θ* which describe authors covering similar topics during their career. In order to better understand this result, in Fig. 2B we compare the distribution of *θ* with a null model obtained considering the first and a random year of activity from the career of each author. We repeat this process 1000 times to obtain the confidence intervals shown in the figure. The plot clearly shows how the tendency toward exploration (small value of *θ*) is much more prominent when comparing the first and last year of activity rather the first with another year extracted at random. This observation provides the first hint to the fact that exploration is a gradual process. In Fig. 2C we repeat this same analysis but considering a different metric: the Jaccard index. This is a test of robustness of the results and to avoid possible spurious effects induced by sparse vectors in the cosine similarity. The figure clearly confirms the picture emerging from the other two panels.

These first results demonstrate that exploration seems to be the preferred strategy. Does this apply also to authors that started their career in different years? Also, how does *θ* depend on the career duration? In Fig. 3 we answer to these questions. In particular, in Fig. 3A we show the similarity as a function of the starting year for the second level PACS. Interestingly, we see a similar trend. Strong exploration (cosine similarity <0.1) seems to be the preferred strategy with strong exploitation (cosine similarity >0.9) the second most abundant trend. The only exception are younger scientists—who published their first paper in the 00s—that seem to prefer exploitation. The reason behind this result could be given by the fact that younger scientists are usually pursuing their mentors research line and have not outlined their own research agenda yet. Moreover, our dataset is limited to 2006 thus for authors that started working in the early 2000s we have access to only the initial phase of their careers. To test this hypothesis, in Fig. 3B we show the similarity as a function of the career duration. The plot shows an interesting trend. Short career durations (less than 4 years) show a higher propensity to exploitation, while longer careers usually mean a tendency to exploration. This reinforces our idea that younger scientists tend to follow the research interests of their mentors and that the shift in the research line occurs after the Ph.D.—the crossover in Fig. 3B takes place around 4 or 5 years of career, the usual duration of Ph.D. studies in many countries. This finding is in line with the analyses done by Battiston et al. [38] that showed how the average time of the first transition between fields is around 3–7 years depending on the field. However, we also note that an alternative and plausible hypothesis is that this result reflects a change in the way science is done: the culture of “publish or perish” indeed enforces incremental publications at the cost of undermining exploration or more risky career paths. In the future, when we will have more data about the evolution of younger authors, we shall be in a better position to discriminate among these two scenarios. As done above, in order to better understand the picture emerging from the data, we compare the tendencies towards explorations and exploitation with a null model. In this, we compare the first year of publication with another extracted at random in the career of each author. We show the results in Fig. 3C–D. The colors reflect the relative variation between the values from the panels A–B and the values obtained in the null model. The two figures confirm how the tendency towards exploration is much marked when the first year of activity is compared with the last respect to what we would aspect picking the second vector at random during the career of each author. Furthermore, the plots show how high (low) values of exploitation are over(under)-represented in the null model. Indeed, across different year of first publication and career duration, green cells are concentrated for high values of *θ* while red cells for small value of it. This confirms how the exploration is, on average, a gradual process.

As a way to consolidate all the previous observations, in Fig. 4 we plot the average similarity as a function of the first year of publication and the career duration. Interestingly, we don’t see any clear dependence on the starting year. The crucial difference is instead on the career duration. Indeed, the largest values of similarity are concentrated in the region of short careers. Authors with long careers instead are more prone to exploration. Having said that, another interesting question stems from this result: do authors with longer careers tend to explore more because they have more time or is it that researchers with a higher propensity to exploration usually stay in academia for longer?. To answer this question, in Fig. 5 we compare the cosine similarity between the first and the fifth year of career of authors with a career duration of exactly 5 (Fig. 5A) and 10 or more years (Fig. 5B). The relative change between the similarity profiles (Fig. 5C) demonstrates that for strong exploration there are no difference between the two groups and scientists with short careers only have a milder tendency to strong exploitation. This confirms that exploration is more a product of time than a discriminant of scientific careers.

Once confirmed that exploration is the preferred strategy for the majority of authors, we can measure, by using the same vectors, the share of interest kept towards a set of PACS previously used (exploitation) and towards a set of new PACS (exploration). For each author we quantify the fraction of new and old PACS comparing the different career stages. In particular, we define the exploration share (ES) of author *i* at stage *l* or her career as:

$$ ES_{i}^{l}=\sum_{\alpha }x^{i,l}_{\alpha } \bigl( 1- H\bigl[x^{i,f}_{\alpha }\bigr]\bigr), $$

(1)

where \(H[n]\) is a step function such that \(H[n]=1\) for \(n\ge 0\). In words, \(ES_{i}^{l}\) is the sum of the components of \(x^{i,l}\) that were zero in \(x^{i,f}\), thus the share of research activity towards new PACS. As vectors are normalised, the exploitation share is instead \(1-ES_{i}^{l}\). By studying the exploration share of each author we can go a step further in our analysis and explore differences between different subfields. In Fig. 6 we plot the average exploration value as a function of the first topic used by each author. In other words, we observe the tendency towards exploration differentiating between users starting in different fields and sub-fields. We note that Particle Physics, Nuclear Physics, Geology Astronomy and Astrophysics are less prone, on average, to explore different topics while the two Condensed Matter and Atomic and Molecular Physics are the ones with the highest exploration. We can speculate that this is due to the fact Particle, Nuclear and Astro Physics are very specialized and usually require large infrastructures while methods employed in other areas are more general. Looking inside each area we can see in some cases a large variability, e.g. in General Physics. Some sub-topics have a high ES like *Mathematical methods in Physics* (id. 02) or *Metrology, measurements, and laboratory procedures* (id. 06) while *General relativity and gravitation* shows one of the lowest propensity to exploration of the entire dataset. Along this line, an interesting example is topic id. 35 *Experimentally derived information on atoms and molecules; instrumentation and techniques* that, despite a large proportion of papers (more than 800), also presents the largest ES. This is a spurious result due to the fact that id. 35 has been deleted from the 1995 edition of the classification [41] and its topic split along other PACS. Thus, all the scientists working on the topic seemed to suddenly move to other PACS.

So far we have quantified the tendency of authors towards exploration and exploitation. However, when authors explore new topics which ones do they consider? Are there exploration patterns more likely than others? How do these depend on the starting set of interests? To answer these questions, we first build origin-destination matrices by considering the flow of researchers from PACS to PACS comparing the first and last year of activity. Clearly, this analysis neglects trajectories between the two periods, but it offers a first indication of the general trends in scientific interest contrasting two distinct career phases. Let’s define the flow from PACS *α* to PACS *β* as:

$$ M_{\alpha,\beta }=\sum_{i} \biggl( H \bigl[x^{i,l}_{\alpha }\bigr]H\bigl[x^{i,f}_{ \beta } \bigr]\delta _{\alpha,\beta }+ (1-\delta _{\alpha,\beta })\frac{H[x ^{i,l}_{\beta }]H[x^{i,f}_{\alpha }](1-H[x^{i,f}_{\beta }])}{ \sum_{\gamma }H[x_{\gamma }^{i,f}]} \biggr). $$

(2)

Each element of the matrix considers all the authors (thus the sum over *i*). Furthermore, we have two types of elements: inside and outside the diagonal. The first term contributes to the diagonal elements (\(\delta _{\alpha,\beta }\) is the Kronecker delta) and it assumes a value of 1 for all the authors that kept working on the PACS *α* in the first (*f*) and last (*l*) year of career. Thus, the term counts how many authors kept interest in the same PACS. The second term instead contributes to the off-diagonal elements. The numerator is equal to 1 for all the \(\alpha -\beta \) pairs that respect the following conditions: the author *i* (i) did not use *β* in the first year, (ii) used *β* in the last year, (iii) used *α* in the first year. The denominator instead is equal to the number of different PACS used in the first year. Thus, we connect each PACS used in the first year with those used only in the last year as a way to map the evolution in interest and a transition from a set of topics to another set. In Fig. 7 we report the results considering the first level of the classification. The first panel is obtained considering all the authors in the dataset. The other three instead are obtained distinguishing the researchers by the year of first activity. Some important observations are in order. In general, the diagonal, for all the years, contains the largest values. This result, combined with Figs. 2,3 and 4, highlights an interesting phenomenon. While most of the authors after 4 or 5 years of career almost totally change their interests, they usually remain in the larger area of Physics where they started. In a sense, in each author there is a strong tendency to explore but only within sight from their initial topic. This latter result is the empirical confirmation of the “essential tension” between risky and conservative strategies.

Looking at how physicists move outside their original area, other interesting trends emerge too. One of them is that the tendency towards exploitation is particular strong for scientists starting their career in Physics of Elementary Particles, Nuclear Physics, and Condensed Matter (Electronic Structure, Electrical, Magnetic, and Optical Properties) while another interesting observation concerns the sub-field of Physics of Gases, Plasmas and Electric Discharges (id 5). Indeed, across years we can observe that, with respect to all the other topics, this is the one that is less likely to “attract” researchers from other areas. A similar result holds, although more nuanced, for the field of Geophysics, Astrophysics, and Astronomy. On the other hand, as far as exploration is concerned, the field that is able to attract more authors that initiated their publication record in other subjects is General Physics, which is by construction one of the most interdisciplinary fields. Moreover, from the matrices two clusters are clearly visible. The first is formed by Particle and Nuclear Physics. The second instead is formed by the two fields of Condensed Matter and Interdisciplinary Physics. The presence of such cluster implies that, for example, authors starting in Particle Physics are more likely, in case they explore new topics, to move towards Nuclear Physics. Finally, it is interesting to note how these patterns are preserved across different generations of researchers that started publishing in different decades.

Overall, the results showed so far can be summarised as follows: (i) even if exploration is the preferred strategy, usually it is confined within the first level of the classification, probably offering the right mix between exploration and exploitation, (ii) exploration is a gradual process that take place during the career of each author (iii) exploration outside the first level is not random as the transition from some fields to others is more likely. These observations are in line with previous work done with different measures and metrics [37, 38, 42]. However, they are in contrast with the work done by Foster et al. [2] and Jia et al. [5]. The first group focused on a different research area (Biomedical Chemistry) and studied 133 awardees of scientific prizes. In that field, scientists seem to prefer exploitation than exploration. This opposite trend highlights how the essential tension might be a function of the area of study. The second group studied, as we do here, the APS dataset. However, they considered a subset of authors that published at least 16 papers (their results do not change considering 12 or 20). Furthermore, they considered event time (i.e. publications) rather than real time (i.e. years). Thus the sequence of publication of each authors does not have gaps (years of inactivity are not accounted for). While this approach is quite useful to eliminate possible issues associated to burstiness, it mixes individuals with very different publication rates and at different career stages. The last point is particularly relevant as the scientific maturity and independence, often necessary for exploration, are not necessarily a function of the number of papers published (especially in some disciplines that feature large collaborations). Indeed, our results, as well as those by Battiston et al. [38], show that periods before and after the typical PhD duration (3–7 years) are characterized by very different tendencies toward exploration. The contrast between the two results highlights a very important point: the *inclusion* principle used to select the sample of scientists under study, and the approach used to account for time, might influence the results. It is important to notice how each methodology features different pros/cons and effectively select a different sample (with possible overlaps). Cleary, more work needs to be done to explore the effects of different approaches aimed at defining which publication record should be considered as signature of a professional scientist.

Up to now we have mapped the transitions, that is flows between topics, comparing the first and last year of activity in our database. Next, we deepen our investigation by mapping the flows as a function of time. To this end, we consider all authors that published a paper in year *t* and/or \(t+1\). Note that we adopted a two years window to increase the statistics. Then, we consider the fraction of such authors that published a paper also in year \(t+2\) and/or \(t+3\). For each bi-annual time window, we dispose PACS in a circle and connect them with links proportionally to how many authors used PACS *α* and then PACS *β*. However, instead of plotting all links, we show only the most significative. To this end, we compare the flows from the data with those we would expect by random chance. In particular, we randomize the flows between fields using the classic configuration model which allows to preserve the degree and strength distributions [43]. We create 1000 randomized configurations and compare them with the measured flows in the data. In Fig. 8 we show, at the first level of the classification, the flows with a Z-score equal larger than two. Several observations are in order. In each time window, the majority of significant links are those within a particular field (i.e. self-links). This observation highlights one more time how exploration is a gradual process. In the short term, exploitation is more prominent. However, a clear temporal trend is evident: self-links are much heavier in the early times and during the first years we don’t see much flow between fields. The authors that published in contiguous time windows did not change topics as much as in later times. In the period 1984–1986, instead, we start seeing an increase in connectivity between fields signaling either the publication of multidisciplinary papers (articles containing PACS from different fields) and/or authors exploring different fields. We see clearly how self-links in the two branches of Condensed Matter (6 and 7), as well as in Elementary Particle and Nuclear Physics (1 and 2) become less prominent across time. Interestingly, the mixing between Elementary Particle and Nuclear Physics (1 and 2) starts in 1982–1984 and becomes more evident from 1990–1992. Across all time windows, the two branches of Condensed Matter (6 and 7) and Elementary Particles (1) are the fields with the largest out-flow towards others. They are followed by General Physics (0) and Geophysics, Astronomy, and Astrophysics (9) among others. Furthermore, we observe the raise in popularity (i.e. the length of each arc) of General (0), Interdisciplinary (8), and Geophysics, Astronomy, and Astrophysics (9). Such increase is balanced by a decrease in popularity of Physics of Gases and Plasmas (5), Elementary Particles and Nuclear Physics (1 and 2). It is important to note that, by definition, the popularity is not a single measure of the number of papers written each year in each field. Indeed, it is modulated by the number of authors that wrote papers in two consecutive years. Other significant flows are the exchange of authors between Condensed Matter: Structural, Mechanical and Thermal Properties (6) and Interdisciplinary Physics (8) as well as between the Physics of Elementary Particles and Fields (1) and Geophysics, Astronomy, and Astrophysics (9) which show an increase as function of time. Our results are in line with the Physics “census” recently conducted by Battiston et al. [38] with a much larger sample of publication venues. We also mention that our dataset does not allow us to see later trends that Battiston et al. [38] observed, such as spikes of productivity in 2010 in Elementary Particle Physics or the relative reduction of Condense Matter in the last years.