Skip to main content

The structural evolution of temporal hypergraphs through the lens of hyper-cores

Abstract

The richness of many complex systems stems from the interactions among their components. The higher-order nature of these interactions, involving many units at once, and their temporal dynamics constitute crucial properties that shape the behaviour of the system itself. An adequate description of these systems is offered by temporal hypergraphs, that integrate these features within the same framework. However, tools for their temporal and topological characterization are still scarce. Here we develop a series of methods specifically designed to analyse the structural properties of temporal hypergraphs at multiple scales. Leveraging the hyper-core decomposition of hypergraphs, we follow the evolution of the hyper-cores through time, characterizing the hypergraph structure and its temporal dynamics at different topological scales, and quantifying the multi-scale structural stability of the system. We also define two static hypercoreness centrality measures that provide an overall description of the nodes aggregated structural behaviour. We apply the characterization methods to several data sets, establishing connections between structural properties and specific activities within the systems. Finally, we show how the proposed method can be used as a model-validation tool for synthetic temporal hypergraphs, distinguishing the higher-order structures and dynamics generated by different models from the empirical ones, and thus identifying the essential model mechanisms to reproduce the empirical hypergraph structure and evolution. Our work opens several research directions, from the understanding of dynamic processes on temporal higher-order networks to the design of new models of time-varying hypergraphs.

1 Introduction

Many complex systems composed of interacting elements can be effectively described within the theory of static networks [13]. This powerful framework provides a wide set of techniques and tools to characterize the interactions at different topological scales, through global graph properties (e.g. density), possibly focusing on specific groups of relevant nodes (e.g. k-cores) and providing various measures of node centralities. Furthermore, this multi-scale characterization helps identify nodes and mesostructures with relevant roles in dynamical processes, since the interaction structure deeply impacts processes unfolding on networks [3, 4]. Despite the power of network theory, recently several empirical evidences have brought out the limits of this framework, which by definition is restricted to a static description of systems involving only binary interactions.

On the one hand, several systems present time-varying interactions, which follow specific dynamics and temporal patterns [57]: for example, human social interactions [8], scientific collaborations [9] and neural systems [5, 10]. These systems are represented using temporal networks, a generalization of static networks in which nodes interact via links with specific activation and deactivation times [5, 6]. Several structural characterization tools for static networks have been generalized to time-varying graphs, showing the non-trivialities emerging from the introduction of the temporal dimension [57]: for instance, span-cores can decompose a temporal graph into subgraphs of controlled duration and increasing connectivity [11, 12]. Moreover, dynamic processes on temporal networks are also impacted by the network dynamics, especially when the dynamics of and on the network have comparable time scales [5, 6, 13, 14].

On the other hand, many complex systems also feature interactions between groups of agents, not reducible to sets of pairs [15, 16]: this is the case for example of human social interactions [17], scientific collaborations [18] and species interactions in ecosystems [19]. An adequate description of these systems involves hypergraphs, a generalization of networks in which nodes can interact in groups of arbitrary size, i.e., hyperedges [15]. Taking into account such higher-order nature of interactions leads to the definition of new structures and concepts and to new dynamical phenomena [15, 16, 2022]. Indeed, several dynamical processes, including contagion dynamics, synchronization phenomena and consensus formation, exhibit richer and more complex dynamics when defined on higher-order networks, with important differences with respect to the dynamics occurring on pairwise networks, such as changes in the nature of the phase transitions observed [15, 20, 21, 23]. Despite the relevance of such higher-order effects, tools to characterize hypergraphs at various scales have only recently been proposed: for example, efforts have been devoted to defining explicitly higher-order centrality measures, accounting for information otherwise impossible to retrieve by pairwise measures [15, 24]; moreover, a few techniques and methods have been developed to identify relevant higher-order substructures in hypergraphs [15, 2527]. Among them, the hyper-core decomposition [26, 27] identifies a doubly nested hierarchy of mesoscopic subhypergraphs, the hyper-cores, composed of nodes progressively more densely connected to each other through interactions of increasing size. This technique provides a global fingerprint of systems described using hypergraphs and identifies structurally central mesostructures that play an important role in higher-order dynamical processes [26]. This decomposition also comes with an associated centrality measure for nodes, the hypercoreness, which is based on the node structural position at the various interaction orders [26].

The increasing attention to the development of frameworks to handle time-varying and non-pairwise structures speaks for the need of using both the temporal and the higher-order nature of interactions to adequately describe and model several complex systems and dynamical processes. The integration of these two features has occurred relatively recently within temporal hypergraphs, where hyperedges present specific activation times and duration, describing evolving group interactions [15]. Some works focused on defining procedures to construct temporal hypergraphs from data [28, 29], others on the impact of the hypergraph dynamics on dynamic processes [30, 31]. Only few attempts have been made to investigate the temporal-topological properties of temporal hypergraphs [29, 3236], and a complete structural characterization is still missing. Moreover, synthetic models of temporal hypergraphs have been proposed to identify and replicate the mechanisms that govern the evolution of empirical systems [9, 3538], but model-validation tools are still scarce. Therefore, it becomes necessary to develop dedicated multi-scale characterization methods tailored for temporal hypergraphs. These techniques are essential to accurately describe empirical systems, construct and validate synthetic models, and ultimately identify crucial temporal structures for higher-order dynamic processes: how does the higher-order structure evolve at different scales over time? Are there persistent groups of nodes exhibiting dense connections at different interaction orders, or do these configurations change dynamically? Are the most structurally central nodes always the same, or do they undergo changes over time?

Here, we tackle such issues by proposing a multi-scale method for the characterization of temporal hypergraphs at different topological scales. By applying the hyper-core decomposition to successive snapshots of a temporal hypergraph, and by following the evolution of the resulting hierarchical structure, we are able to characterize the structure and its evolution at different scales: macroscopically, following the evolution of the relative sizes of the hyper-cores; mesoscopically, focusing on the dynamics of specific hyper-cores; microscopically, following the position of single nodes in the hyper-core structure over time. Measuring the similarity between the hyper-core structure at different times enables the quantification of the structural stability of the system at different topological scales. We also define two time-aggregated hypercoreness centralities for nodes, based on the node instantaneous hypercoreness and its evolution, which together provide an overall description of its structural behavior. We apply the proposed approach to several data sets representing systems of diverse nature. This enables us to identify differences and similarities in their structure and evolution, unveiling temporal patterns, and to establish connections between structural properties and specific activities within the systems. Finally, we illustrate how the proposed method provides a model-validation tool for synthetic models of temporal hypergraphs. To this aim, we propose several models of activity-driven temporal hypergraphs [9, 13, 39, 40] which progressively implement mechanisms for the formation of group interactions of increasingly complexity. We tune these models to mimic the activity patterns of the interaction data sets and show how, following the hyper-core decomposition over time, we are able to distinguish between the hyper-core structures and dynamics generated by the models at different topological scales, providing a quantitative comparison between synthetic models and empirical hypergraphs.

The paper is organized in the following way: in Sect. 2.1 we describe the hyper-core decomposition and how it provides a multi-scale method for the characterization of temporal hypergraphs; in Sect. 2.2 we define two time-aggregated centrality measures for nodes; in Sect. 2.3 we present the empirical data sets considered, and in Sects. 2.4, 2.5 we apply the proposed method to different data sets; in Sect. 2.6 we show how our method can be used as a model-validation tool, considering different hypergraph models; in Sect. 3 we summarize the main results, discuss their implications and outline some future perspectives. In order to avoid accumulating too many technical details in the previous sections, we leave the detailed presentation of several aspects of our methodology to Sect. 4-Methods (on the hyper-core decomposition in Sect. 4.1, on the data preprocessing in Sect. 4.2, on reshuffling procedures in Sect. 4.3 and on the temporal hypergraph models in Sect. 4.4).

2 Results

2.1 Following the hyper-core decomposition of temporal hypergraphs

Let us consider a time-varying hypergraph \(\mathcal{H}\) observed over the time interval \((0,t_{max}]\). We consider a snapshot representation of \(\mathcal{H}\) with temporal resolution τ [28], i.e., the interval \((0,t_{max}]\) is divided into \(n=t_{max}/\tau \) time windows of length τ: \(\mathcal{H} = \{\mathcal{H}_{t} \}_{t=1}^{n}\), where in each time window t the instantaneous hypergraph \(\mathcal{H}_{t}=(\mathcal{V}_{t},\mathcal{E}_{t})\) is an unweighted static hypergraph formed by the set \(\mathcal{V}_{t}\) of nodes active at least once in \(((t-1)\tau ,t \tau ]\) and by the set \(\mathcal{E}_{t}\) of hyperedges active at least once in \(((t-1)\tau ,t \tau ]\) (with \(N_{t}=|\mathcal{V}_{t}|\) and \(E_{t}=|\mathcal{E}_{t}|\)). A hyperedge \(e=\{i_{1},i_{2},\ldots,i_{m}\} \in \mathcal{E}_{t}\) represents a group interaction between nodes \(i_{k} \in \mathcal{V}_{t}\) \(\forall k=1,\ldots,m\): it consists in a set of m nodes, with \(m \in [2,M_{t}]\), where \(M_{t} = \max _{e \in \mathcal{E}_{t}} |e|\). We denote with \(\Psi _{t}(m)\) the hyperedge size distribution in the time-window t.Footnote 1

We propose to characterize the structural evolution of the temporal hypergraph \(\mathcal{H}\) by applying the hyper-core decomposition procedure to each snapshot \(\mathcal{H}_{t}\) [26]. The hyper-core decomposition decomposes static hypergraphs into series of subhypergraphs of increasing connectivity, ensured by hyperedges of increasing sizes. Specifically, the \((k,m)\)-hyper-core of the snapshot \(\mathcal{H}_{t}=(\mathcal{V}_{t},\mathcal{E}_{t})\) is defined as the maximum subhypergraph that contains all the nodes \(i \in \mathcal{V}_{t}\) involved in at least k distinct hyperedges of size at least m within the subhypergraph itself (see Methods and [26]).

The set of nodes belonging to the \((k,m)\)-core but not to the \((k+1,m)\)-core forms the \((k,m)\)-shell. Each node i in the temporal hypergraph can thus be assigned a time-varying m-shell index \(C_{m}(i,t)\), which defines the maximum k such that i belongs to the \((k,m)\)-hyper-core but not to the \((k+1,m)\)-hyper-core at time t. This leads to the definition of the hypercoreness \(R(i,t)\) of node i in \(\mathcal{H}_{t}\) by [26]:

$$ R(i,t)=\sum _{m=2}^{M_{t}} C_{m}(i,t)/k_{max}^{m}(t) \ , $$
(1)

where \(k_{max}^{m}(t)\) is the maximum connectivity at order m for the snapshot t, such that the \((k_{max}^{m}(t),m)\)-core is not empty, but the \((k_{max}^{m}(t)+1,m)\)-core is empty. \(R(i,t) \in [0,M_{t}-1]\) summarizes the centrality properties of i with respect to the hyper-core decomposition at time t by taking into account its relative depth in the \((k,m)\)-core structure at all interaction orders [26] .Footnote 2

By considering the hyper-core decomposition of the successive snapshots forming the temporal hypergraph, we can thus follow the temporal evolution of its higher-order hierarchical structure, and obtain a characterization of the higher-order dynamics at several scales, as we now discuss.

Macroscopic scale. The fraction of nodes within the \((k,m)\)-hyper-cores, \(n_{(k,m)}\), as a function of k and m constitutes the filling profile of the hyper-cores, and provides information on the distribution of nodes in the various cores and shells. Following its evolution across successive snapshots yields information on how the overall system’s cohesiveness changes over time. The filling profile can indeed detect changes in the underlying higher-order hierarchical structure, since different distributions of nodes in the hyper-cores reflect different configurations of interactions in the nested hierarchy [26]: for instance, a smooth decay of \(n_{(k,m)}\) with k and m suggests the presence of nodes progressively more densely connected with each other through interactions of larger sizes (homogeneously populated shells), while the alternation of plateaus and abrupt drops reveals the presence of a non-trivial structure, with nodes poorly or densely connected with each other, without intermediate behaviours (unevenly filled shells). Thus, the similarity between the hyper-cores filling profiles of two different snapshots, \(n_{(k,m)}(t)\) and \(n_{(k,m)}(t')\), provides a quantitative estimate of the stability of the macroscopic hyper-core structure over time. While several similarity measures can be defined between the filling profiles of two hypergraphs, we consider here the root-mean-square deviation similarity, defined as follows for the filling profiles \(a_{(k,m)}\) and \(b_{(k,m)}\) of two static hypergraphs \(\mathcal{A}\) and \(\mathcal{B}\) with respective maximum connectivities \(k_{max}^{m}(\mathcal{A})\) and \(k_{max}^{m}(\mathcal{B})\) m, and respective maximum hyperedge sizes \(M_{\mathcal{A}}\) and \(M_{\mathcal{B}}\):

$$ \Sigma (\mathcal{A},\mathcal{B}) = 1- \sqrt{ \frac{\sum \limits _{k=1}^{\overline{K}} \sum \limits _{m=2}^{\overline{M}} \left (a_{(k,m)} - b_{(k,m)}\right )^{2}}{\overline{K} \, (\overline{M}-1)-1}}, $$
(2)

with \(\overline{K}= \max \limits _{m}\{ \max \{k_{max}^{m}(\mathcal{A}),k_{max}^{m}( \mathcal{B})\} \}\) and \(\overline{M}=\max \{M_{\mathcal{A}},M_{\mathcal{B}}\}\) (in this way \(\Sigma \in [0,1]\)). Footnote 3,Footnote 4 The temporal similarity matrix \(\Sigma (t,t') = \Sigma (\mathcal{H}_{t}, \mathcal{H}_{t'})\) provides then a way to explore the existence of various temporal patterns in the hyper-core decomposition of the system at different times, and to unveil the presence of stable periods, recurrences or sudden changes [7, 10, 41, 42].Footnote 5

Mesoscopic scale. By following the hyper-core decomposition over time, it is moreover possible to study the temporal stability and changes occurring in subhypergraphs with specific structural roles. To this aim, we can consider a given set of shells or cores, and compare their sets of nodes A in two different snapshots t and \(t'\) through the Jaccard similarity \(J(t,t')=|A_{t} \cap A_{t'}|/|A_{t} \cup A_{t'}|\). The matrix \(J(t,t')\) quantifies the stability over time of the set of nodes forming the cores under scrutiny. In particular, we will here focus on the set of the most central hyper-cores of each snapshot, i.e. the \((k_{max}^{m},m)\)-hyper-cores m. We can then determine whether these cores are stable, involving always the same nodes across snapshots, or whether their composition evolves, due to changes of connectivity of individual nodes: this can happen even when the macroscopic structure remains similar (as found in temporal networks where the most connected nodes can vary with time [43], or a core-periphery structure can be stable even when the composition of the core strongly fluctuates [10]).

Moreover, empirical data include sometimes meta-data (see Methods) describing properties or attributes of the nodes or hyperedges, and dividing them into classes based on their specific function or context. For instance, data describing social interactions can be enriched by information on the individuals involved (e.g., to which class they belong in a school environment, to which department or which role they have in a work environment). Such information makes it possible to study whether different groups or classes of nodes have different higher-order structural properties, and whether specific hyper-cores are preferentially composed by specific nodes or specific types of hyperedges. For instance, one can identify the most represented class in each hyper-core at each time, and follow over time which types of nodes or hyperedges are dominant in the most central hyper-cores.

Microscopic scale. At the node level, the hypercoreness \(R(i,t)\) gives an instantaneous measure of the centrality of a node in each snapshot. It is thus possible, for each node of interest, to follow its trajectory in the hyper-core structure through the evolution of its hypercoreness. More precisely, in order to make the hypercoreness values comparable across different snapshots, we consider the temporal evolution of the relative position of each node i in the hypercoreness ranking:

$$ r(i,t) = \frac{R(i,t)}{\max \limits _{j \in \mathcal{V}_{t}} \{R(j,t)\}}. $$
(3)

The evolution of \(r(i,t)\) with t indeed reflects the movements that node i undergoes within the hierarchical structure, potentially navigating towards more central or more superficial cores.

The set of all \(R(i,t)\) moreover provides an instantaneous node hierarchy within the time window t. Such a hierarchy might fluctuate from one snapshot to the next [43], and the Pearson correlation coefficient \(\varrho (t,t')=\varrho (R(i,t),R(i,t'))\) of the nodes hypercoreness values between two time snapshots t and \(t'\) provides information on the stability of the node ranking over time, i.e., on how the nodes change their respective structural positions over time. Just as \(\Sigma (t,t')\) for the global scale and \(J(t,t')\) for intermediate scales, this measure can unveil correlation patterns at various time-scales: for example, a high and constant \(\varrho (t,t')\) indicates that nodes tend to keep their relative structural positions over time, while constantly low values correspond to an unstable situation with nodes continuously changing place in the hierarchy.

Note that, as not all nodes are active in each snapshot, we can compute \(\varrho (t,t')\) in two ways: (i) \(\rho ^{*}(t,t')\) takes into account only the nodes that are active in both t and \(t'\), while (ii) \(\rho (t,t')\) is computed considering all nodes active in at least one of them (setting the hypercoreness of inactive nodes to 0). The difference between \(\rho (t,t+1)\) and \(\rho ^{*}(t,t+1)\) provides information on the structural properties of nodes just after entering the system or right before leaving it: \(\rho \lesssim \rho ^{*}\) indicates that nodes have mainly low hypercoreness when joining/leaving the system, while \(\rho \ll \rho ^{*}\) indicates that nodes joining/leaving the system tend to be central.

2.2 Time-aggregated hypercoreness centralities

The hypercoreness centrality of nodes in static hypergraphs has been shown to provide information on their importance for dynamic processes involving higher-order interactions unfolding on such hypergraphs [26]. Many processes however unfold on time-varying hypergraphs [30, 31], hence a time-aggregated ranking of nodes summarizing the evolution of their instantaneous coreness could prove useful.

We first define the snapshot activity \(a_{w}(i) \in [0,n]\), given by the number of time windows in which node i is active, and the average number of interactions when active \(\overline{h}(i)=D(i)/a_{w}(i)\), where \(D(i)\) is the total number of hyperedges in which i is involved in the temporal hypergraph. We then introduce two time-aggregated centrality measures that summarize the positions of the nodes in the hyper-core structure over time:

  • the aggregated hypercoreness W:

    $$ W(i)=\sum _{t=1}^{n} \frac{R(i,t)}{\max \limits _{j \in \mathcal{V}_{t}} \{R(j,t)\}}=\sum _{t=1}^{n} r(i,t) , $$
    (4)

    takes into account how deep i is in the hyper-core structure at the various interaction orders in each time window, and simply aggregates this information over time.

  • the activity-averaged hypercoreness :

    $$ \overline{W}(i)=\sum _{t=1}^{n} \frac{r(i,t)}{a_{w}(i)} = \frac{W(i)}{a_{w}(i)} , $$
    (5)

    averages W over the activity of the nodes.

W and provide complementary information. Indeed, a high W can be obtained either for a node i that is very active (high \(a_{w}(i)\)) but not very central (small \(r(i,t)\)) or for a node j that is not very active (low \(a_{w}(j)\)) but central when active (high \(r(j,t)\)). These two situations are distinguished when taking into account also , as \(\overline{W}(i)\) will then be small while \(\overline{W}(j)\) will be large. Together, the time-aggregated hypercoreness measures \(W(i)\) and \(\overline{W}(i)\) thus provide a two-dimensional picture taking into account both the activity of nodes and the evolution of their relative centralities over time.

2.3 Empirical temporal hypergraphs

The approach outlined is general and can be applied to empirical data of higher-order interactions evolving over time describing a variety of systems. In the following, we showcase its interest using: a data set of scientific collaborations [44, 45], several data sets of physical proximity interactions between individuals in various environments [4654] (a hospital [50], a conference [48], three schools [5153], a university [54] and a workplace [47, 48]), and a data set of email communications [5557]. These data sets present different statistical, topological and temporal properties (e.g., interaction size distribution, temporal patterns due to system-specific activities). Full details on all the data sets and on the preprocessing procedures are available in the Methods Section (Sect. 4) and in the Supplementary Material (SM) (see Additional file 1). In the main text we specifically analyse data describing three different systems, while results for the other data sets are reported in the SM. In particular, here we consider:

  • the scientific collaborations data set of the American Physical Society (APS), which provides the list of papers published in APS journals from 1893 to 2021 [44, 45]. We build a temporal hypergraph (see Methods) in which each node corresponds to an author, each hyperedge represents a paper connecting its co-authors and is endowed with a label indicating the journal in which the paper was published.

  • a data set of face-to-face human interactions in a hospital (LH10), collected within the SocioPatterns collaboration [46, 50]. The data set has a temporal resolution of 20 seconds and covers a period of 96 hours. We build a temporal hypergraph in which each node corresponds to an individual and each hyperedge represents a group interaction, defined with a temporal resolution of 5 minutes (see Methods) [20]. Each node is assigned with a label indicating its social role: Med for doctors, Param for nurses, Admin for administrative staff, and Patient for patients.

  • a data set of proximity human interactions in a university, collected within the Copenhagen Network Study (CopNS) [54]. The data set has a temporal resolution of 5 minutes and covers a period of 4 weeks. We build a temporal hypergraph from the data by considering each individual as a node, and each hyperedge as a group interaction with a temporal resolution of 5 minutes (see Methods) [36].

For the university data set, we also show how the analysis of the hyper-core structure over time can contribute to the validation of models of time-varying hypergraphs.

2.4 Dynamics of the higher-order structure of scientific collaborations

We represent the APS scientific collaborations data set through a time-varying hypergraph in which each node corresponds to an author and each hyperedge represents a paper connecting its co-authors (see Methods). We consider a 5-years temporal resolution, i.e., each temporal snapshot is formed by all papers published in a 5-years time window (see SM for a different temporal resolution), and we consider the period 1962-2021 (earlier years having only much smaller numbers of nodes and hyperedges).

Figure 1a shows the evolution of the global hyper-cores structure as given by the filling profiles, which do not simply expand in a monotonous fashion as the numbers of nodes and hyperedges increase over the years. Initially the system presents only \((k,m)\)-hyper-cores with low connectivity k, especially for large hyperedge sizes m; then, the filling profile undergoes an expansion towards higher k and higher m values. At first, \(k_{max}^{m}\) increases for high interaction orders m and only later at low orders. Furthermore, the increase in \(k_{max}^{m}\) is non-monotonic with respect to time, especially for low m: \(k_{max}^{m}\) for \(m \gtrsim 2\) grows up to a maximum in the 1997-2001 snapshot, and then decreases and stabilizes in the following years (as we will discuss below, this behavior can be traced back to a specific scientific community and its collaboration dynamics). Thus, the cohesiveness of the scientific community first increased through connected large size collaborations, then an increase in cohesiveness occurred at all orders until 1997-2001. The cohesiveness of the community then relaxed to a lower but stationary level in the last 20 years.

Figure 1
figure 1

Evolution of the hyper-core structure in APS scientific collaborations. a: fraction of nodes \(n_{(k,m)}\) in the \((k,m)\)-core as a function of k and m for each 5-years time window. The numbers of active nodes \(N_{t}\) and hyperedges \(E_{t}\) are also reported and the insets show \(n_{(k,m)}\) as a function of k for \(m=2\), \(m=6\) and \(m=10\). b: root-mean-square deviation similarity \(\Sigma (t,t')\) between \(n_{(k,m)}(t)\) and \(n_{(k,m)}(t')\) (grey diagonal: \(\Sigma (t,t)=1\)). c: Jaccard similarity \(J^{*}(t,t')\) between the sets of nodes belonging to the most central hyper-cores, i.e. to the \((k_{max}^{m},m)\)-cores m, at time t and \(t'\) (grey diagonal: \(J^{*}(t,t)=1\)). d: Pearson correlation coefficient \(\rho (t,t')\) between the nodes hypercoreness at times t and \(t'\), considering all the nodes that are active in at least one of the snapshots (grey diagonal: \(\rho (t,t)=1\)). e: similarity \(\Sigma (t,t+1)\) vs. t. f: temporal evolution of \(J^{*}(t,t+1)\) and Jaccard similarity \(J_{N}(t,t+1)\) between the entire population in two consecutive time windows. g: temporal evolution of the correlation between the nodes hypercoreness in consecutive snapshots, considering all the nodes that are active in at least one of the snapshots, \(\rho (t,t+1)\), or only those active in both, \(\rho ^{*}(t,t+1)\). Note that macroscopically the size and the density of the interactions evolve in a non-trivial way, however the overall filling of the hyper-cores remains quite similar over time; the composition of the most central hyper-cores is highly unstable, suggesting a high system instability at the mesoscopic and microscopic scales

Although the size of the interactions and the density of collaborations change over time, the overall structure of the filling profiles remains similar instead (Fig. 1a). In fact, the hyper-cores always present a rapid and progressive emptying of the cores as k and m increase: superficial shells (low k) are densely populated, and shells become gradually less populated with increasing k and m. The root-mean-square deviation similarity \(\Sigma (t,t')\) between the hyper-cores filling profiles at time t and \(t'\) presents very high values for all pairs \((t,t')\) (Fig. 1b), indicating a stable structure: the similarity is particularly high between consecutive snapshots (Fig. 1e), and decreases monotonically when \(|t'-t|\) increases.

We investigate the mesostructural level through the similarity \(J^{*}(t,t')\) between the sets of nodes belonging to the most central cores, i.e. to the \((k_{max}^{m},m)\)-hyper-cores m at different times. Figure 1c,f shows that the stability of the central cores is low, even between adjacent time windows. This is not only due to the fact that the set of authors change over time, as \(J^{*}\) is much lower than the Jaccard coefficient \(J_{N}\) between the sets of authors in different time windows. \(J^{*}(t,t')\) moreover decreases to 0 as soon as the time difference \(|t'-t|\) exceeds 2-4 time windows, indicating a completely different composition of the central hyper-cores. Note that a tendency to increase the stability of the central cores can be seen until ≈2010 (Fig. 1c,f), although it decreases again afterwards. Overall the \(J^{*}\) values remain low, indicating that the nodes sitting in the most central hyper-cores change over time.

We further explore this instability using the correlation \(\rho (t,t')\) of nodes hypercoreness across different time windows, as shown in Fig. 1d,g. A positive correlation is observed between the hypercoreness values of nodes in successive snapshots, but the correlation \(\rho (t,t+1)\) computed using all nodes active at least once in \((t,t+1)\) is lower than \(\rho ^{*}(t,t+1)\), which takes into account only nodes active in both snapshots (Fig. 1g). As discussed above, this indicates that some nodes with high centrality leave the system, and/or nodes enter the system and gain immediately a central position. As the temporal distance \(|t'-t|\) increases, the correlation \(\rho (t,t')\) progressively decreases. Moreover, the correlations tend to increase with t: \(\rho (t,t+1)\) increases with t and the decrease of \(\rho (t,t')\) with \(|t'-t|\) becomes slower (Fig. 1d,g), indicating an increased stability in centrality rankings as time evolves.

The correlation between hypercoreness values decays to zero in approximately 3-5 time windows and then reaches negative values: this suggests a progressive inversion of the rankings over time, with nodes successively increasing and decreasing their hypercoreness and rankings, as driven by the unfolding of their academic careers. Figure 2 indeed gives some examples of the evolution of individual nodes’ relative hypercoreness \(r(i,t)\), which are a reflection of the academic trajectories of the corresponding scientists. Some nodes have a bell-shaped hypercoreness profile, entering the system with a low centrality, progressively moving towards the more central cores and then back to lower ranks. This can describe the academic trajectory of a young researcher, who enters into the scientific community, becomes central and then progressively leaves the community due to retirement or a change in the topic/journals reference of their research. Other nodes present instead a rather stable ranking, and, for individuals having entered the system more recently, only the upward trend of increasing centrality is observed.

Figure 2
figure 2

Hypercoreness evolution for selected nodes in the APS scientific collaborations. We show the temporal evolution of the hypercoreness \(r(i,t)\) for four authors and the mean \(\langle r \rangle (t)\) value (average on active nodes): we show the authors I.Y. Lee (\(\#_{W}1\)) and R.V.F. Janssens (\(\#_{W}2\)), who occupy respectively the first and second position in the ranking produced by the aggregated hypercoreness W over the period 1942-2021, and the authors Guang-Can Guo (\(\#_{\overline{h}}1\)) and Loren N. Pfeiffer (\(\#_{\overline{h}}5\)), who occupy respectively the first and fifth position in the ranking produced by the average number of interactions per active windows over the period 1942-2021. Nodes can have different behaviors, ranging from a stable to a bell-shaped temporal profile of the hypercoreness: these profiles mirror movements of the node in the hyper-cores structure towards more central or more superficial hyper-cores, and can reflect the authors’ academic trajectories

To characterize the nodes’ overall behaviours, we moreover compute their time-aggregated centrality measures, and show the results in Fig. 3. On average, the aggregated hypercoreness \(\langle W \rangle \) increases with the activity snapshot \(a_{w}\) (Fig. 3a), but a large variability in the values of W is observed at given \(a_{w}\). Some nodes can be very active but display a low centrality, while nodes with moderate activity can reach large values of W. The average number of interactions per active window is also only weakly correlated with W, and the nodes with highest W do not coincide with those with largest (see Fig. 3b). Finally, the aggregated and activity-averaged hypercoreness, W and , also do not produce the same ranking (see Fig. 3c). Some nodes are not often active (low \(a_{w}\)) with medium-low W but high : these authors appear in few windows but within very connected communities, therefore are very central on average when active but their low \(a_{w}\) make them less relevant in aggregated terms. Other nodes are very active (high \(a_{w}\)) with medium-high W but relatively low : such authors are often active either with a low centrality or with non-monotonous hypercoreness profile (see Fig. 2). Overall, the combined information of W and provide a more complete description of nodes structural behavior on the whole time span than when considering only one of these centralities.

Figure 3
figure 3

Time-aggregated hypercoreness in APS scientific collaborations 1942-2021. a: scatter plot of the aggregated hypercoreness \(W(i)\) as a function of the snapshot activity \(a_{w}(i)\) for all nodes i, and average aggregated hypercoreness \(\langle W \rangle \) as a function of \(a_{w}\). b: aggregated hypercoreness \(W(i)\) vs. average number of interactions per active window \(\overline{h}(i)\) for all nodes i. c: aggregated hypercoreness \(W(i)\) as a function of the activity-averaged hypercoreness \(\overline{W}(i)\). In all panels the points are colored according to the activity \(a_{w}\) of the corresponding node. Note that the two time-aggregated hypercoreness measures provide complementary information and a complete description of the structural behavior of the nodes over the entire observation period; moreover, they distinguish different behaviors not identified by other centrality measures

We finally leverage the fact that each hyperedge representing a scientific article is labelled by the journal it was published in to examine the importance of the various APS journals in the hyper-core structure. The APS journals can be interdisciplinary (e.g. PRL) or specialized in a specific research field (e.g. PRC for nuclear physics, PRD for high-energy physics, PRB for condensed matter physics), thus representing a specific research area [45] (see SM).

For each \((k,m)\)-core we consider all the hyperedges it contains and their labels, and we identify the dominant journal (namely, whose frequency exceeds 0.5; if no journal is represented by more than half of the hyperedges, we consider that no journal dominates) .Footnote 6 Figure 4a shows the resulting evolution of the hyper-cores dominant journal. Initially, PR and PRL dominate within all the hypercores, since they were the only available journals together with RMP (not shown in the figure, see SM). Then, the more superficial cores present a mixed composition, while the most central ones are first dominated by PRL in the period 1962-1981; subsequently in 1982-1986, central cores are mostly formed by the high-energy physics community (PRD) for large collaboration sizes, while at low order the nuclear physics area dominates (PRC). Starting from 1992, PRC dominates the most central hyper-cores at all orders: the non-monotonic behavior observed in the core structure, with the maximum connectivity in 1997-2001, is predominantly due to interactions within the nuclear physics area. This could be due to several discoveries in the field occurring in the preceding years (e.g. the discovery of the W and Z bosons [58] or the discovery of top quarks [59, 60]), which boosted collaborations in the community, favouring and increasing cohesion. After this phase the nuclear physics area remains overall dominant. Moreover, this non-monotonic behavior can also be identified in the hyper-core decomposition of the hypergraphs obtained by considering only the papers published in PRC (see SM). Recently, the condensed matter physics community (PRB) is also expanding its contribution to the central cores at low interaction orders. The relative contribution of the scientific communities to the set of the most central cores is summarized in Fig. 4b: PRL is the dominant journal in the first time windows, while the share of PRC increases rapidly starting in the 80s; the share of PRB becomes also important from 2012-2016 and in 2017-2021 new journals start gaining relevance (e.g. PRX).

Figure 4
figure 4

Prevalent APS scientific communities in hyper-cores. a: temporal evolution over 5-years time windows of the prevalent journal within each \((k,m)\)-hyper-core of the APS data set, defined as the most frequent hyperedge label in each core (we consider a journal dominant only if its frequency is larger than 0.5; white indicates hyper-cores which are empty or where a dominant journal cannot be defined). b: relative frequency P of the various journals within the most central hyper-cores, i.e. \((k_{max}^{m},m)\)-cores m, and its temporal evolution. c: same as b for the randomized data. We average the relative frequency over 50 randomized realizations of the hypergraph (see Methods). The error bars give the standard errors. We identify the scientific communities most densely connected at different orders of interaction: this pattern evolves over time, following specific trends of collaborations in the different research areas, and is significant when compared with appropriate randomized systems

As the number of scientists and articles in various fields are neither homogeneous nor constant, we check whether such patterns are simply due to the relative abundance of authors and articles in the different journals. To this aim, we build a randomized version of the temporal hypergraph, which preserves in each time window the hypergraph structure and the total number of interactions of each order for each label, but destroys any correlation between the nodes and the label of the hyperedges in which they participate (see the reshuffling procedure in Methods). We consider 50 randomized realizations and for each hyper-core we estimate the average frequency of each label. The patterns of topic dominance in the most central cores is significantly different in the reshuffled version compared to the empirical case (see Fig. 4b,c and SM). For example, in the reshuffled case PRA, PRB and PRE are significantly more represented in the central cores, while PRC is instead less represented than in the original data.

It is also possible to consider a different time resolution for building the temporal hypergraph, to investigate e.g. the dynamics at shorter time-scales, or to focus on one specific scientific community by considering the hypergraph formed by articles published in one specific journal. We refer to the SM for some results in such directions.

2.5 Higher-order structure dynamics of interactions in a hospital

We now consider the data set of face-to-face interactions in a hospital (LH10), represented through a time-varying hypergraph where nodes correspond to individuals and hyperedges to group interactions (see Methods). We first study differences in the daily aggregated hypergraph structures, i.e., we aggregate the temporal hypergraph over 24-hours time windows (thus obtaining \(n=4\) time windows).

The maximum size of interactions \(M_{t}\) and the maximum connectivity values \(k_{max}^{m}(t)\) m, i.e. the cohesiveness of the system, are rather stable over different days (Fig. 5a,b). However, nodes are differently distributed within the cores. On the first day, the population of the \((k,m)\)-cores features sharp drops when k increases, followed by plateaus: these correspond to densely populated shells at small k followed by almost empty shells. In other days, the structure instead presents a more progressive emptying of the cores as k increases, hence shells are populated more homogeneously (even if some jumps and plateaus of reduced sizes are still present). The root-mean-square deviation similarity \(\Sigma (t,t')\) between the hyper-cores filling profiles at time t and \(t'\) still presents high values for all pairs \((t,t')\) (see Fig. 5c), however the similarity is lower than the one observed for the APS data set. Moreover, the similarity Σ between consecutive snapshots increases over time (Fig. 5f).

Figure 5
figure 5

Hyper-core structure evolution in daily interactions within a hospital (LH10). a: relative population \(n_{(k,m)}\) of the \((k,m)\)-core as a function of k and m for each time window. The number of active nodes \(N_{t}\) and hyperedges \(E_{t}\) is reported for each snapshot. b: \(n_{(k,m)}\) as a function of k for fixed values of m. c: root-mean-square deviation similarity \(\Sigma (t,t')\) between \(n_{(k,m)}(t)\) and \(n_{(k,m)}(t')\) – the grey diagonal corresponds to \(\Sigma (t,t)=1\); d: Jaccard similarity \(J^{*}(t,t')\) between the sets of nodes belonging to the most central hyper-cores, i.e. the \((k_{max}^{m},m)\)-cores m, at time t and \(t'\) – the grey diagonal corresponds to \(J^{*}(t,t)=1\). e: Pearson correlation coefficient \(\rho (t,t')\) between the nodes hypercoreness at time t and \(t'\), considering all the nodes that are active in at least one of the snapshots – the grey diagonal corresponds to \(\rho (t,t)=1\). f: similarity \(\Sigma (t,t+1)\) as a function of t. g: temporal evolution of both the similarity \(J^{*}(t,t+1)\) and the Jaccard similarity \(J_{N}(t,t+1)\) between the entire population in consecutive time windows. h: temporal evolution of the correlation between the nodes hypercoreness in consecutive snapshots, considering all the nodes that are active in at least one of the snapshots, \(\rho (t,t+1)\), or that are active in both, \(\rho ^{*}(t,t+1)\). Note that macroscopically the density and the size of the interactions are quite stable, even if the overall filling of the hyper-cores changes over time; the composition of the most central hyper-cores is highly stable, suggesting a high system stability at the mesoscopic and microscopic scales

Mesoscopically the system is quite stable (see Fig. 5d,g): the similarity \(J^{*}(t,t')\) between the nodes in the most central cores at time t and \(t'\) presents medium-high values, \(J^{*}(t,t')\) slightly decreases when increasing \(|t'-t|\) and in consecutive time windows it still assumes values close to the similarity of the entire population \(J_{N}\), even if decreasing over time. The composition of the most central cores is thus quite stable, therefore in general the nodes maintain the same position in the core structure. This is confirmed by the correlation \(\rho (t,t')\) in the nodes hypercoreness between two snapshots (see Fig. 5e,h). The correlation \(\rho (t,t')\) presents high values. As we will explore further below, this stability in the composition of central cores and in the behavior of the nodes is due to the difference in the roles played by the different individuals in the hospital, which limits the mobility of the nodes in the hyper-core structure.

Note that, even if the position of the nodes in the hyper-core structure is fairly stable over time, the evolution of the hypercoreness \(r(i,t)\) for single nodes can show different trajectories. This is evident when disaggregating by social role, as for the examples in Fig. 6: the nodes can present a stable dynamic with a constant position in the core structure, as shown by the patient and the paramedic cases, or a non-monotonic dynamic, with movements from more central cores towards more superficial ones and vice-versa, as for the doctor and the administrative staff member.

Figure 6
figure 6

Hypercoreness evolution in the temporal hypergraph of daily interactions within a hospital (LH10). We show the temporal evolution of the hypercoreness \(r(i,t)\) for four agents with different social role: a paramedic (id = 1210), a medic (id = 1144), a member of the administrative staff (id = 1098) and a patient (id = 1383). The dashed line shows the mean \(\langle r \rangle (t)\) (averaged only on active nodes). Nodes can have different behaviors, ranging from a stable to a non-monotonous temporal profile of hypercoreness. This profile reflects changes in an individual’s interaction patterns, corresponding to the node’s movements within the hyper-cores structure, either towards more central or more superficial hyper-cores. Note how the patient’s hypercoreness is always lower than the average, while the paramedic’s hypercoreness is always maximal

These different behaviours are summarized by the time-aggregated centrality measures. In general the aggregated hypercoreness W increases with the snapshot activity \(a_{w}\) (see Fig. 7a), however nodes with the same \(a_{w}\) can have very different W. Analogously, W and the average number of interactions when active are positively correlated, but there are outliers, which produce different top positions in the corresponding rankings (see Fig. 7b). By taking the structure into account, the aggregated hypercoreness can thus provide a different and more detailed information than the activity or the average number of interactions. The aggregated and activity-averaged hypercoreness show that the nodes that are globally relevant are also relevant, on average, when active (see Fig. 7c). Nevertheless, the produced rankings are still different since some nodes are relevant when active (high ), but not globally (low W). By combining the two time-aggregated hypercoreness measures we obtain information on the different overall behaviors of the nodes (see Fig. 6).

Figure 7
figure 7

Time-aggregated hypercoreness in a hospital (LH10). a: scatter plot of the aggregated hypercoreness \(W(i)\) as a function of the snapshot activity \(a_{w}(i)\) for all nodes i, and averaged aggregated hypercoreness \(\langle W \rangle \) as a function of \(a_{w}\). b: aggregated hypercoreness \(W(i)\) vs. average number of interactions per active window \(\overline{h}(i)\) for all nodes i. c: aggregated hypercoreness \(W(i)\) as a function of the activity-averaged hypercoreness \(\overline{W}(i)\). In all panels points are colored according to the node’s social role. Note that the two time-aggregated hypercoreness provide a complete and complementary description of the structural behavior of the nodes over the full observation period. Different social roles present different behaviors, e.g., patients present low values of all centrality measures, doctors and administrative staff have heterogeneous behaviors, while nurses feature high values of all centralities

We finally expose strong differences in the temporal and structural properties of specific roles in the hospital. Figure 7 shows that the activity \(a_{w}\) is quite independent of the social role; however, the patients have a homogeneous behavior occupying always the lower positions in all the rankings produced by the other time-aggregated centrality measures; on the contrary the nurses, doctors and administrative staff present a more heterogeneous behaviour, presenting a wide range of centrality values. Nurses constitute the most structurally and temporally relevant group according to all the time-aggregated centrality measures, always occupying the top positions of the rankings (see Fig. 7).

The nurses have a key role also mesoscopically: in each \((k,m)\)-hyper-core indeed, we identify the dominant social role when possible by checking whether more than half of the nodes of a core belong to one category. In the superficial cores it is not possible to identify a dominant role, however in the most central cores the nurses dominate in all time windows and at all interaction orders (see Fig. 8a). Nurses thus constitute the most densely connected social group at all the orders of interaction, thus the interactions structure in the most central cores is attributable to their activities.

Figure 8
figure 8

Prevalent social role in hyper-cores of a hospital (LH10). a: temporal evolution over 24-hours time windows of the prevalent social role in each \((k,m)\)-hyper-core of the LH10 data set, defined as the most frequent label in the core: we use a color code for identifying social roles and we consider a role dominant only if its frequency is larger than 0.5. In white are indicated hyper-cores which are empty or where no dominant role can be identified. b: temporal evolution of the hypercoreness \(r(i,t)\) averaged over all nodes (dashed black line) and averaged over each distinct class. c: temporal evolution of the relative frequency P of the various social roles within the top 15% positions of the nodes ranking given by the hypercoreness \(r(i,t)\). d: same as b, but in this case we consider the relative frequency P averaged over 50 randomized realizations of the hypergraph (see Methods). In this case, we also show error bars corresponding to the standard errors. We identify the social roles most densely connected at different orders of interaction. This pattern is very stable, with nurses being the most densely connected at all interaction orders. Nurses present higher hypercoreness than other social roles, while patients have values lower than the average. This pattern is significant when compared to appropriate randomized systems

The dominant role of nurses is further highlighted microscopically by considering the evolution of the average hypercoreness \(r(i,t)\) within each specific class (see Fig. 8b). All roles present a quite stable average hypercoreness: patients and nurses present a hypercoreness notably lower and higher than the average, respectively, while doctors and administrative staff are close to the average behavior. Moreover, if we consider the instantaneous ranking produced by the hypercoreness and estimate the frequency of each role, we find that nurses always dominate the top positions (see Fig. 8c). This pattern is not due to a difference in numbers of nodes or hyperedges, as we check by comparing the results with a reshuffled data set in Fig. 8d: we generate 50 random realizations of the hypergraph, which completely preserve in each time window the structure of the hypergraph and the total number of nodes with each label, but destroys correlations between the labels of interacting nodes (see the reshuffling procedure in Methods). The frequencies of the different social roles in the top positions of the hypercoreness ranking, averaged over all the realizations, shows strong differences compared to the original case.

While we have here focused on the changes occurring between different days, it is possible to consider a different temporal resolution to focus e.g. on specific activities in the system occurring at a different time scales: in the SM we consider as an example the evolution occurring within a single day with 2-hours time windows.

In the SM we also apply the proposed analysis to data sets describing interactions between individuals in different contexts (see Methods). In some contexts, the composition of the hyper-cores present a strong structural variability and instability: this corresponds e.g. to conferences or workplaces where different days can bring very different patterns of connections. A more stable structure is obtained in others, with high stability of the cores composition, e.g. systems in which patterns of interactions are repeated over time due to role and activities constraints, such as in schools and hospitals (see SM). Such differences in the results highlight and confirm the interest of following the hyper-core decomposition over time as a characterization tool for temporal hypergraphs.

2.6 A validation tool for time-varying hypergraph models

We now illustrate how the hyper-core decomposition can also help with the validation of synthetic models of temporal hypergraphs. More precisely, it can serve as a tool to quantitatively validate whether a model reproduces given hierarchical structures and structural dynamics of interest, such as those of an empirical temporal hypergraph, at several topological and temporal scales. To showcase the potential of this as a tool, we consider several models of temporal hypergraphs of increasing complexity, and tune them to reproduce the activity patterns of a data set. We then apply the previously described approach to each model and to the original data set, identifying differences among the models, and ultimately investigating which model ingredients make it possible to generate a non-trivial hierarchical structure that resembles the one found in the data.

For simplicity, we consider models within the class of activity-driven (AD) networks: these models are based on simple mechanisms for the formation of interactions [13], and can be refined to include increasing complex realistic features and tuned to reproduce many properties of empirical data sets [40, 6164]. We consider here several generalizations taking into account higher-order interactions, in a similar spirit as [9, 37]. In each model, we consider a population of N nodes: each node is assigned at each time t an activity parameter \(a_{t}(i)\), which represents the node propensity to generate interactions and sets its activation rate (Poissonian activation dynamics). When a node is active, it generates a hyperedge of size m, drawn from a distribution \(\Psi _{t}(m)\) (which potentially depends on the time step t). The remaining \((m-1)\) nodes are selected in the population with mechanisms depending on the specific AD model. We consider the following models:

  • Higher-order activity-driven model (HAD). This model is the hypergraph generalization of the standard AD network [13] and of the simplicial activity-driven model (SAD) [9]. Each active node creating an hyperedge of size m chooses the \(m-1\) nodes to interact with uniformly at random from the whole population. This model takes into account only the heterogeneity of the agents behaviour, through their activities, and the one of the size of the groups. Interactions are instantaneous and there is no memory between successive time steps.

  • HAD model with attractiveness (HADA). This model corresponds to the hypergraph generalization of the AD network with attractiveness [39, 40, 63, 64]. Each node is also assigned with an attractiveness parameter \(b_{t}(i)\), which defines the intensity with which the node attracts active interactions. Each active node, to create an interaction of size m, selects the \(m-1\) other nodes in the population randomly with probability proportional to their attractiveness b. The interactions are instantaneous and there is no memory. We consider \(b_{t}(i)=a_{t}(i)\) i at each time, i.e. the most (less) active nodes are also the most (less) attractive ones, as observed in empirical systems [63, 64].

  • HAD model with memory (HADAM). This model is the HADA with the introduction of an additional memory mechanism, similar to that proposed in the AD networks with memory [61, 62]. For each active node i, we denote by \(l_{t}(i)\) the number of other nodes with which it has already interacted in previous time steps. The active node i, to create an interaction of size m, selects the \(m-1\) other nodes (i) with probability \(p_{t}(i)=1/(1+l_{t}(i))\), among those not yet encountered, (ii) with probability \((1-p_{t}(i))\) among those already met. These nodes are selected: in the former case, with probability proportional to their attractiveness \(b(j)\); in the latter case, with probability proportional to their attractiveness \(b(j)\) and to the number of times they have already met with the active node \(\omega _{ij}\).

Each model can be fed by empirical data in the following manner. Given an empirical temporal hypergraph \(\mathcal{H}\) and its snapshot representation \(\{\mathcal{H}_{t}\}_{t=1}^{n}\), for each model we consider the same population size as the empirical hypergraph; moreover, we use the empirically observed hyperedges size distribution \(\Psi _{t}(m)\) at each time step, and we tune the activities \(a_{t}(i)\) so that the total number of interactions at each time, \(n_{t}^{tot}\), and the total number of interactions in which each node is involved, \(n_{t}(i)\), replicate the empirical ones (see Methods for more details on the hypergraphs generation).

Here specifically, we consider the data set of human interactions in a university (CopNS), represented through a temporal hypergraph where nodes correspond to individuals and hyperedges to group interactions (in the SM we also apply the same analysis to the hospital data set). Once we have generated the three synthetic temporal hypergraphs, we aggregate both data and models on 1-day time windows (see Methods). We then apply the hyper-core decomposition to each time window and compare the resulting structures and their temporal evolution at this time scale. We mainly focus here on the first working days of the first week of the data, and we show in the SM that similar temporal and structural patterns are obtained also for other days and weeks.

The original data set presents a non-trivial filling of the cores, with significant differences over time (see Fig. 9a): on Monday the \((k,m)\)-cores present a rapid emptying for all orders when k increases, with a rapid drop in the population (densely populated shells), followed by an extended plateau (empty shells); a similar structure is obtained on Wednesday and Thursday, but with some differences in the drops widths, in the plateaus extensions and in the maximum connectivity values; on Tuesday instead, the structure is very different, the maximum connectivity values are much lower and the plateau observed in the other time windows is almost absent. These filling profiles suggest the presence of a rich hierarchical structure in the hypergraph that changes over time.

Figure 9
figure 9

Hyper-cores structure in time-varying hypergraphs models. We consider the CopNS data set as well as the HAD, HADA and HADAM models adjusted to the CopNS node activities and hyperedge size distributions, and aggregated over 1-day time windows. a: relative population \(n_{(k,m)}\) of the \((k,m)\)-core as a function of k and m from Monday to Thursday of the first week; the number of active nodes \(N_{t}\) and hyperedges \(E_{t}\) are also reported. The insets show \(n_{(k,m)}\) as a function of k for fixed values of m. The first row corresponds to the empirical data; the second, third and fourth rows correspond to the hypergraphs generated respectively with the HAD, the HADA and the HADAM models. b: similarity Σ between the hyper-cores filling profiles of the empirical hypergraph \(\mathcal{H}_{t}\) and each of the synthetic models \(\mathcal{H}_{t}'\) in the same time window t. c: similarity \(J^{*}(t,t+1)\) between the most central hyper-cores, i.e. \((k_{max}^{m},m)\)-cores m, in two consecutive snapshots, and Jaccard similarity \(J_{N}(t,t+1)\) between the entire population of the data set in consecutive time windows. d: Pearson correlation coefficient \(\rho ^{*}(t,t+1)\) between the nodes hypercoreness in two consecutive snapshots, considering all the nodes that appear in both time snapshots. In panels c-d we consider both the data set and the corresponding synthetic models. The results presented here show that the hyper-core decomposition provides a tool for the validation of temporal hypergraph models: the HADAM model reproduces quite well the empirical hierarchical structure and its evolution at all the topological scales, while the HADA and HAD models fail to reproduce it at all scales

The HAD model, despite replicating the activities and hyperedge sizes distributions of the data, has a very different hyper-core decomposition, which does not display any hierarchical structure (Fig. 9a): all \((k,m)\)-cores are equally populated by the whole population until \(k \sim k_{max}^{m}\), then \(n_{(k,m)}\) quickly collapses to zero; all the shells are empty apart for those with \(k \sim k_{max}^{m}\) which contain the entire population. The model thus does not replicate the empirical hierarchical structure nor its evolution, neither mesoscopically, since all cores coincide with the entire population, nor microscopically, since all the nodes have the same position in the core structure. This is expected due to the interaction mechanism of the model —which generates a completely mean-field structure.

By contrast, the temporal hypergraph obtained from the HADA model does present a hierarchical structure: the population of the \((k,m)\)-cores decreases progressively and smoothly with k at all orders m, indicating the presence of uniformly populated shells. The system presents a hierarchy both mesoscopically, since there are groups of nodes more densely connected, and microscopically, since the nodes are distributed on the various shells. The model partially replicates the changes in the maximum connectivity, but it does not completely reproduce the empirical hierarchical structure, as the shapes of \(n_{(k,m)}\) vs. k are rather different from the empirical ones (insets of Fig. 9a).

Finally, the synthetic hypergraphs generated using the HADAM model present a rich hierarchical structure that reproduces quite well the empirical one and its evolution, both in the maximum connectivity and in the filling profiles. Indeed, the memory effect drives the creation of interactions between nodes that have already met several times in the past, thus favoring non-trivial patterns with densely connected groups of nodes. Some quantitative differences with the empirical structure are still observed, such as a more progressive emptying of the cores with k, and slightly different \(k_{max}^{m}\) values.

Figure 9b provides a quantitative comparison of the hyper-core structures generated by each model with the empirical one, through the root-mean-square deviation similarity Σ between the respective hyper-cores filling profiles in each time window. As expected from the above considerations, the hyper-core structure of the HADAM model is the most similar to the empirical one with \(\Sigma \sim 0.95\), followed by the HADA model (\(\Sigma \sim 0.80\)), and by the HAD model (\(\Sigma \sim 0.60\)). Similar results are also obtained with other similarity measures (see SM).

At the mesoscopic scale, the empirical data present a strong instability in the most central cores (see Fig. 9c), with a very low similarity \(J^{*}(t,t+1)\) between consecutive snapshots. The HAD model, on the contrary, presents a very high stability in the deepest cores, reproducing the empirical similarity \(J_{N}\) of the entire population, as expected since the whole population composes the most central cores (see Fig. 9a). The HADA and the HADAM models yield a lower stability of the central cores: the variations in activity and memory effects are enough to generate changes in the mesoscopic hierarchical structure and similarities closer to the empirical case, even if still higher. At the microscopic level, the empirical data set alternates phases with low and high hypercoreness correlations in consecutive snapshots \(\rho ^{*}(t,t+1)\), (see Fig. 9d): during the weekdays the structural position of nodes change a lot across days (low \(\rho ^{*}\)), because of varying activities, while during the weekends it is quite stable (high \(\rho ^{*}\)). On the contrary, the three models present approximately constant correlation values: the HAD model trivially does not present any correlation \(\rho ^{*} \sim 0\), since the model does not generate any hierarchy of nodes in any time window; the HADA model instead presents higher correlations \(\rho ^{*} \sim 0.30\), as the system generates a hierarchical structure with high-activity nodes being the most central over time; finally, the HADAM model presents the highest correlations \(\rho ^{*} \sim 0.60\), since the memory forces the creation of correlations in nodes behavior over time and could be balanced only by strong changes in nodes activity.

These results are further confirmed by comparing the entire similarity matrices of the models with the ones of the empirical hypergraph at different scales (see SM, for the matrices \(\Sigma (t,t')\), \(J^{*}(t,t')\), \(\rho (t,t')\) and \(\rho ^{*}(t,t')\)): the HADAM model better reproduces the evolution and temporal stability of the empirical system at all the temporal and structural scales, while the HADA and HAD models feature larger differences, with the HAD model leading to the widest discrepancy (see SM).

We finally compare in Fig. 10 the behaviour of the time-aggregated centralities measures in the data and models. The original data set presents a wide variability. In fact, even if the aggregated hypercoreness W and the activity-averaged hypercoreness are positively correlated, there are nodes very central on average when active (high ) but globally not relevant (low W) and vice-versa. This suggests different node hypercoreness trajectories and node movements across the core structure (see SM). The system also presents a heterogeneous distribution of the aggregated hypercoreness W, \(P(W)\), which provides a clear ranking of nodes. Moreover, nodes with the same snapshot activity \(a_{w}\) can present very different structural behaviors, indeed the activity is unevenly distributed in the W classes: the nodes with relevant structural role (high W) are frequently active (high \(a_{w}\)), but nodes poorly structurally relevant (low W) can have very different activity values.

Figure 10
figure 10

Time-aggregated hypercoreness in time-varying hypergraphs models. We consider the CopNS data set with 1-day time windows over four weeks, as well as the three synthetic models. a: scatter plots of the aggregated hypercoreness W as a function of the activity-averaged hypercoreness for each node: the points are colored according to the snapshot activity \(a_{w}\) of the corresponding node. b: histograms giving the number of nodes \(P(W)\) with aggregated hypercoreness W: within each bar we distinguish the relative frequency of nodes belonging to each class \(a_{w}\), through stacked bars. In all panels, we consider both the empirical hypergraphs (first column) and the corresponding synthetic temporal hypergraphs (second column - HAD, third column - HADA, and fourth column - HADAM). Note that the two time-aggregated hypercoreness provide a description of the structural behavior of the nodes. The distributions of these measures and their correlations help validate synthetic models concerning the structural and temporal properties of single nodes. The HADAM model reproduces the empirical distributions and correlations quite well, while the HADA and HAD models fail to do so

In the HAD model all nodes have approximately the same activity-averaged hypercoreness but different values of the aggregated one W (see Fig. 10): the HAD model does not produce any hypercoreness hierarchy of nodes in any time window, therefore on average when a node is active it has the same centrality as the others . The aggregated hypercoreness W differentiate among the nodes only through their temporal persistence in the system, i.e. through \(a_{w}\). The distribution of W appears homogeneous and peaked.

The HADA model creates a hierarchy of nodes both in terms of W and (see Fig. 10): in this case, the most globally central nodes are also relevant on average when active, while nodes that are less central globally can feature different behaviours when active, either being very central or not. The distribution \(P(W)\) appears homogeneous and peaked, with a gradual increase in the activity \(a_{w}\) of nodes more relevant. Even if it features a hypercoreness hierarchy, the model does not reproduce the empirical distribution of the aggregated hypercoreness \(P(W)\), and yields a stronger correlation between W and \(a_{w}\) than in the empirical data.

The HADAM model yields a hierarchy both in terms of W and (see Fig. 10), replicating quite well the empirical patterns, even if there are nodes with time-aggregated hypercoreness values, W and , higher than those empirically observed. The distribution \(P(W)\) is heterogeneous, with few nodes with very high W, and also the heterogeneity in nodes structural and temporal behaviours is well reproduced, since the distribution of \(a_{w}\) in the W classes well replicate the empirical case.

Overall, these results show how the hyper-core decomposition allows to validate the hypergraph models structurally and temporally at different scales. The three temporal models are generated starting from the same amount of information extracted from the empirical data set and are tuned to replicate the same statistical and temporal properties. The HAD model fails to produce and replicate the hierarchical structure at any of the scales considered, as the model generates a mean-field structure without hierarchy. The introduction of attractiveness in the HADA model generates a hierarchical structure that however still strongly differs from the empirical one, as the model generates a more progressive core-periphery structure. The memory effect introduced in the HADAM model makes it possible to obtain a hierarchical structure that resembles quite well the empirical one at all scales, except for a stronger correlation between the nodes hypercoreness rankings. Note that analogous results can be obtained also considering other data sets (see the SM).

3 Discussion

Recently, there has been a recognition of the importance of going beyond pairwise and static representations for complex systems [5, 15]. In this article, we have put forward a method for the structural and dynamic characterization of temporal hypergraphs, which represent time-varying systems involving higher-order interactions. The approach is based on decomposing the hypergraph into hyper-cores over time, and it provides a multi-scale characterization: macroscopically, it follows the higher-order hierarchical structure over time, monitoring the stability of the overall hyper-core structure; mesoscopically, it follows the evolution of specific hyper-cores, observing whether stable groups of nodes are densely connected to each other or whether they change over time; microscopically, it follows the structural behavior of single nodes, monitoring their movements across the hierarchical structure, towards more superficial or more central hyper-cores. The approach provides several similarity measures that quantitatively estimate the higher-order structural stability of the system at different topological scales, also identifying temporal patterns in the structure evolution. We moreover introduced two time-aggregated centrality measures of nodes, by aggregating the instantaneous hypercoreness or by averaging it over the node’s activity. These last measures provide additional information on the behavior of the nodes, as opposed to other centrality measures that do not account for higher-order structural properties.

We applied the method to a wide range of data sets describing different systems, characterizing each of them and identifying similarities and differences: for example, stronger instability characterizes systems where the nature of the interactions favors variability in the interaction patterns, such as scientific collaborations, conferences, universities and workplaces; a more stable structure is observed instead in systems with patterns of repeated interactions due tho functional roles, such as schools and hospitals. We also linked structural properties of nodes to specific roles and activities in the systems, thus identifying relevant functions and their evolution over time.

The proposed method represents also an effective model-validation tool, since it allows to quantitatively estimate whether a synthetic temporal hypergraph can replicate the structure of an empirical hypergraph and its evolution at different topological scales, and to compare several candidate models. In this direction, we proposed several models of activity-driven hypergraphs with increasing complexity in the mechanisms that drive the hyperedges formation and we estimated their structural-temporal differences and similarities with respect to the empirical systems. We have shown that models taking into account solely the node activities and the hyperedges size distribution over time cannot reproduce the empirical higher-order structure and its evolution. By contrast, introducing attractiveness and memory, while keeping the model simple, yields non-trivial hyper-core structures and a behaviour closer to the one empirically measured.

Our work opens several research directions and future perspectives. It lays the foundations for the development of new characterization techniques for time-varying hypergraphs [15]: for example, it represents a first step for the definition of a core decomposition of temporal hypergraphs, which is a highly challenging task because of the difficulties in defining a procedure taking into account both non-dyadic interactions and the temporal dimension to generalize, e.g., the span-core decomposition of temporal networks [11, 12]. Our work also provides insights for the understanding of higher-order dynamic processes on temporal hypergraphs, since hyper-cores play an important role in dynamic processes [26]: understanding how the multi-scale evolution of the underlying hypergraph affects dynamic processes is of great interest, in order to fully assess the coupling between the dynamics of and on the hypergraph. This is crucial also for the planning of adaptive measures and interventions, e.g. to maximize or prevent the spread of information on a time-varying hypergraph. Finally, our approach provides tools to guide the design of new models for temporal hypergraphs capable of reproducing higher-order structural properties of empirical systems at different topological scales. Here we have proposed examples of activity-driven hypergraphs featuring different interesting properties [9, 13, 39], however more complex models could be devised [3538, 40], for example introducing correlations between the activity of nodes and the size of hyperedges of which they are member, or considering memory and attractiveness mechanisms involving groups of nodes.

4 Methods

4.1 Hyper-core decomposition

Let us consider an unweighted static hypergraph \(\mathcal{H}_{t}=(\mathcal{V}_{t},\mathcal{E}_{t})\), composed by the set of its nodes \(\mathcal{V}_{t}\) and by the set of its hyperedges \(\mathcal{E}_{t}\). A hyperedge \(e=\{i_{1},i_{2},\ldots,i_{m}\} \in \mathcal{E}_{t}\) consists in a set of m nodes \(i_{k} \in \mathcal{V}_{t}\) \(\forall k=1,\ldots,m\), with \(m \in [2,M_{t}]\), where \(M_{t} = \max _{e \in \mathcal{E}_{t}} |e|\).

The hyper-core decomposition is a procedure that decomposes the hypergraph \(\mathcal{H}_{t}\) into \((k,m)\)-hyper-cores, i.e., a double hierarchy of nested subhypergraphs of increasing connectivity, provided by hyperedges of increasing size. Specifically, the \((k,m)\)-hyper-core of \(\mathcal{H}_{t}\), denoted as \(\mathcal{F}_{t}^{(k,m)}=(\mathcal{A}_{t}^{(k,m)},\mathcal{S}_{t}^{(k,m)})\), is defined as the maximum subhypergraph that contains all the nodes \(i \in \mathcal{V}_{t}\) involved in at least k distinct hyperedges of size at least m within the subhypergraph itself. It contains all the hyperedges that are subsets of interactions in the original hypergraph \(\mathcal{H}_{t}\), of size at least m and that contain only nodes of \(\mathcal{A}_{t}^{(k,m)}\). Therefore, \(\mathcal{A}_{t}^{(k,m)}=\{i \in \mathcal{V}_{t} \, \text{s.t.} \, D_{m}^{ \mathcal{F}_{t}^{(k,m)}}(i) \geq k\}\) and \(\mathcal{S}_{t}^{(k,m)}=\{e \cap \mathcal{A}_{t}^{(k,m)} \, \text{s.t.} \, e \in \mathcal{E}_{t} \wedge |e \cap \mathcal{A}_{t}^{(k,m)}| \geq m\}\), where \(D_{m}^{\mathcal{F}_{t}^{(k,m)}}(i)\) is the number of distinct interactions of size at least m in which the node i is involved in \(\mathcal{F}_{t}^{(k,m)}\). Note that the \((k,m)\)-hyper-core includes the \((k,m+1)\)- and \((k+1,m)\)-hyper-cores, producing a doubly nested hierarchical structure which, by increasing k and m, progressively identifies groups of nodes more densely connected with each other through interactions of increasing order [26]. The \((k,m)\)-hyper-core is obtained by removing progressively and iteratively all the nodes with \(D_{m}< k\) and all the hyperedges of size smaller than m [26].

4.2 Data description and preprocessing

We consider data sets covering a wide range of interaction systems and presenting different statistical, topological and temporal properties (see SM).

Scientific collaborations

The American Physical Society (APS) scientific collaborations data set [44, 45] consists in all the APS publications from 1893 to 2021: for each paper the date of publication, the journal and the list of authors are indicated.

We initially addressed some issues appearing in the data: (i) information is missing for some papers, for example on the author list: in these cases we removed the corresponding entries from the data set; (ii) the same author “Name Surname” can appear with the full extended name, as “N. Surname”, “N Surname” or “Na. Surname”; analogously with middle names “Name Second Surname” or “Name-Second Surname”. To minimize the impact of these inconsistencies, we: (a) identified all entries with the same “Surname”; (b) reassigned the papers associated to dotted names to the corresponding extended name, carrying out the reassignment only in case of uniqueness. Some dotted names do not have or have several extended correspondences, making a unique reassignment impossible: in these cases we consider the contracted name as if it were a unique additional author. See the SM for further details on the size of the various issues. The performed approach reduces the problems related to author identification, but does not completely eliminate the issue: it is still possible that two authors have the same name, therefore the publications are attributed as if they were a single individual. Moreover, in the presence of large collaborations, not all authors are listed [65]. Such issues cannot be eliminated through preprocessing of the data without additional information sources to perform a cross-source analysis [65]. However, even without such additional information, the preprocessed data set gives a good enough picture of the scientific interactions as our purpose is here demonstrative and we do not seek to give precise ranking indications concerning scientists, nor follow in detail some careers.

We thus use the data to build a hypergraph in which each node is an author, a hyperedge represents a paper connecting the co-authors, and it is assigned with a label indicating the corresponding journal. Since we focus on the pattern of collaborations between authors, rather than on the absolute scientific production, we do not take into consideration papers with a single author. We obtain a temporal hypergraph with 1-day resolution, and we focus on 1942-2021. We consider 5-years adjacent time windows and aggregate the temporal hypergraph within each of them, obtaining a sequence of unweighted static hypergraphs. Each static hypergraph is composed of all the nodes and hyperedges active at least once in the considered time window. The same group of authors can have co-authored several papers in the same time window producing fully overlapping hyperedges: in this case we consider only one hyperedge (unweighted hypergraph) and we assign a multiple label to it, including all the journals in which the same group of authors published.

Physical proximity

We consider several data sets of human face-to-face interactions obtained through RFID wearable proximity sensors, made publicly available by the SocioPatterns collaboration [46, 48, 49] and by the Contacts among Utah’s School-age Population project [53]. These data sets describe interactions between individuals in several settings and cover different time periods: a workplace (InVS15 [47, 48] - 2 weeks), a conference (SFHH [48] - 2 days), a hospital (LH10 [50] - 4 days), two primary-schools (LyonSchool [51], Utah_elem [53] - 2 days) and a high-school (Thiers13 [52] - 1 week). The data consist in each case in lists of time-resolved pairwise interactions between individuals (nodes), i.e., temporal networks with a time resolution of 20 seconds. To identify group interactions and transform such temporal networks into temporal hypergraphs, we carried out the following procedure [23, 26]: (i) pairwise interactions are aggregated over 5-minutes time intervals; (ii) cliques, i.e. fully connected clusters, are identified in each time step; (iii) in each time interval the maximum cliques, i.e. cliques not fully contained in another clique, are identified and promoted to hyperedges. This procedure generates temporal hypergraphs with 5-minutes resolution. Some data sets have moreover node labels providing information on single nodes properties, e.g. class of each student for LyonSchool, Thiers13, Utah_elem, social role for LH10 and working department for InVS15.

We also consider time-resolved data describing physical proximity events between students in a University, collected through the Bluetooth signal of cellphones during 4 weeks within the Copenhagen Network Study [36, 54] (CopNS). The data set provides pairwise interactions between individuals (nodes) with a temporal resolution of 5 minutes and with information on the signal intensity: we perform the preprocessing procedures described in [36], obtaining a temporal hypergraph with 5-minutes resolution.

Email

Finally, we consider a data set describing email communications within an European institution (email-EU [5557] - 17 months). This data set is publicly available as a temporal hypergraph: each node represents a user, each hyperedge corresponds to an email and involves both the recipients and the sender of the message. The sending time is provided for each hyperedge with 1-second resolution and the information on the directionality of the email is discarded.

4.3 Labels reshuffling procedures

We implement two reshuffling procedures, one for systems with hyperedge labels (e.g. APS), and one for those with node labels (e.g. LH10).

Hyperedge labels reshuffling. We consider a temporal hypergraph \(\mathcal{H}=\{\mathcal{H}_{t}\}_{t=1}^{t=n}\), in which each hyperedge e is assigned with one or multiple labels. We obtain a reshuffled realization of the temporal hypergraph \(\mathcal{H}'\) in the following way: for each static snapshot \(\mathcal{H}_{t}\), we randomly select two hyperedges e and f of the same size m and, if they have different labels \(l_{e}\) and \(l_{f}\), we perform a label swap so that e will have the new label \(l_{e}'=l_{f}\) and f will have the new label \(l_{f}'=l_{e}\). In the case of hyperedges e with multiple labels \([l_{e}^{1},l_{e}^{2},\ldots,l_{e}^{n},\ldots,l_{e}^{q}]\), one of the labels is randomly selected \(l_{e}^{n}\), and the label swap is performed only with it. The procedure is repeated 105 times for each size \(m \in [2,M_{t}]\) and for each static snapshot \(\mathcal{H}_{t}\) (if the number of hyperedges of size m is at least 4 and at least two different labels are available). The described procedure preserves in each temporal snapshot the hypergraph structure, the overall number of hyperedges with each label at each order of interaction, while it destroys the correlations between the nodes and the labels of the hyperedges in which they are involved.

Node labels reshuffling. We consider a temporal hypergraph \(\mathcal{H}=\{\mathcal{H}_{t}\}_{t=1}^{t=n}\), in which each node i is assigned with a label \(l_{i}\). We obtain a reshuffled realization \(\mathcal{H}'\) of the temporal hypergraph in the following way: for each temporal snapshot \(\mathcal{H}_{t}\), we randomly select two nodes i and j and, if they have different labels \(l_{i}\) and \(l_{j}\), we perform a label swap so that i will have new label \(l_{i}'=l_{j}\) and j will have new label \(l_{j}'=l_{i}\). The procedure is repeated 104 times for each temporal snapshot. The described procedure preserves the hypergraph structure and the overall number of nodes with a specific label in each temporal snapshot, but it destroys the correlations between the labels of interacting nodes.

4.4 Temporal hypergraphs models

We generate different synthetic temporal hypergraphs starting from the properties of the empirical hypergraph we want to model. Let us consider an empirical temporal hypergraph \(\mathcal{H}\) observed over the time interval \((0,t_{max}]\). We consider \(n=t_{max}/\tau \) adjacent time windows \(((t-1)\tau ,t\tau ]\) with \(t \in [1,\ldots,n]\). Within each of them we extract the set of active nodes (of size \(N_{t}\)), the distribution of the hyperedge size \(\Psi _{t}(m)\), the total number of interactions \(n_{t}^{tot}\) and the total number of interactions in which each node is involved \(n_{t}(i)\). Then we generate synthetic temporal hypergraphs \(\mathcal{H}'\) with the same nodes of the empirical hypergraph, that within each temporal window t have the same set of available nodes \(N_{t}\), the same distribution \(\Psi _{t}(m)\) of the hyperedge sizes of the empirical data and that, by an opportune tuning of the model parameters, reproduce quite well \(n_{t}^{tot}\) and \(n_{t}(i)\) i. We consider three different models of temporal hypergraphs. Then, we can perform temporal aggregations for both the empirical \(\{\mathcal{H}_{t} \}_{t=1}^{t=n}\) and each synthetic \(\{\mathcal{H}_{t}' \}_{t=1}^{t=n}\) hypergraphs. For instance, starting from data having a 5-minutes resolution, we generate synthetic hypergraphs with the same temporal resolution, and then we consider hypergraphs aggregated over 1-day time-windows for the analysis.

4.4.1 Activity-driven hypergraph (HAD)

The higher-order activity-driven model (HAD) is the hypergraph generalization of the AD network [13] and of the simplicial activity-driven model (SAD) [9]. In this model, given a population of N nodes, each node is assigned with an activity \(a(i)\). In the discrete-time version of this model, in each time-step Δt each node i can activate with probability \(a(i) \Delta t\). When a node activates, it generates a hyperedge of size m, drawn from the distribution \(\Psi (m)\). The remaining \((m-1)\) nodes participating in the interaction are selected uniformly at random from the entire remaining population, i.e. each node is selected with probability \(1/(N-1)\). At the following time-step all hyperedges are erased and the process continues iteratively. Here moreover, we take into account that the set of available nodes (of size \(N_{t}\)), the hyperedge size distribution \(\Psi _{t}(m)\) and the activity of a node \(a_{t}(i)\) can change over time.

The number of interactions in which a node is involved in the time window t of extension τ is:

$$ n_{t}(i)=a_{t}(i) \tau + \sum \limits _{j \neq i} a_{t}(j) \tau \frac{\langle m - 1 \rangle _{t}}{N_{t}-1}, $$
(6)

where the first term is due to the activation of the node i itself and the second term to the activation of another node j. Moreover, \(n_{t}^{tot}=\sum _{i} a_{t}(i) \tau \). Therefore, the HAD model replicates the \(n_{t}(i)\) i and \(n_{t}^{tot}\) of the empirical data set by fixing the activity of each node as:

$$ a_{t}(i)= \frac{n_{t}(i)-\frac{\langle m-1 \rangle _{t}}{N_{t}-1} n_{t}^{tot}}{\tau \left (1- \frac{\langle m-1 \rangle _{t}}{N_{t}-1} \right )}, $$
(7)

where \(N_{t}\), \(\Psi _{t}(m)\), \(n_{t}(i)\) and \(n_{t}^{tot}\) are fixed as in the empirical dataset. We set the time-step Δt equal to the duration of the interactions in the empirical data set.

The model takes into account the hyperedge size distribution, the activity of each single node and their temporal evolution. The mechanism of hyperedges formation is uniform, random and without memory, therefore the generated temporal hypergraph structure is mean-field.

4.4.2 Activity-driven hypergraph with attractiveness (HADA)

The higher-order activity-driven model with attractiveness (HADA) is a generalization of the AD network with attractiveness [39, 63, 64], and it differs from the HAD model through the introduction of an attractiveness parameter which describes the propensity of nodes to attract active interactions. Given a population of N nodes, each node is assigned with an activity \(a(i)\) and an attractiveness \(b(i)\): in each discrete time-step Δt each node i can activate with probability \(a(i) \Delta t\). When a node i activates, it generates a hyperedge of size m, drawn from the distribution \(\Psi (m)\). The remaining \((m-1)\) nodes participating in the interaction are randomly selected from the population with probability proportional to their attractiveness, i.e. each node j is selected with probability \(b(j)/\sum _{k \neq i} b(k)\). At the following time-step all the hyperedges are destroyed and the process is iterated. For simplicity, hereafter we will assume that \(b(i)= a(i)\) i, i.e. the most (less) active nodes are also the most (less) attractive ones, as observed in several real systems [63, 64]. The set of available nodes, the hyperedge size distribution and the activity of a node can change over time.

The number of interactions in which a node is involved in the time window t of extension τ is:

$$ n_{t}(i)=a_{t}(i) \tau + \sum \limits _{j\neq i} a_{t}(j) \tau \frac{\langle m-1 \rangle _{t} a_{t}(i)}{\sum \limits _{k \neq j} a_{t}(k)}, $$
(8)

where the first term is due to the activation of the node itself and the second term to the activation of another node. The HADA model reproduces the \(n_{t}(i)\) i observed in the empirical data, if the activity is:

$$ a_{t}(i)= \frac{n_{t}(i)}{\tau \left ( 1+ \langle m-1 \rangle _{t} \sum \limits _{j \neq i} \frac{a_{t}(j)}{n_{t}^{tot}/\tau -a_{t}(j)} \right )}, $$
(9)

where \(N_{t}\), \(\Psi _{t}(m)\), \(n_{t}(i)\) and \(n_{t}^{tot}\) are fixed as in the empirical dataset. When \(N_{t} \gg 1\), we can approximate \(a_{t}(i) \sim n_{t}(i)/\langle m \rangle _{t} \tau \) since \(\sum _{j \neq i} a_{t}(j) \sim n_{t}^{tot}/\tau \): this holds for all the time windows of all the datasets considered. We set the time-step Δt equal to the duration of the interactions in the empirical data set.

The model takes into account the hyperedge size distribution and the activity of each node, together with their temporal evolution; the hyperedges formation mechanism is still random and without memory, but favors interactions with high activity nodes. The generated temporal hypergraph has a progressive core-periphery structure: high-activity nodes compose the core, being densely connected to each other and to the rest of the population; nodes with progressively lower activity become gradually more peripheral, being increasingly less connected to each other and only connected to the nodes in the core.

4.4.3 Activity-driven hypergraph with memory (HADAM)

The higher-order activity-driven model with memory differs from the HADA model for the introduction of a memory mechanism, analogous to that introduced in the AD network with memory [61, 62]. Given a population of N nodes, each node is assigned an activity \(a(i)\) and an attractiveness \(b(i)\): in each discrete time-step Δt each node i can activate with probability \(a(i) \Delta t\). Here we consider activities and attractiveness depending on time. At time t moreover, we define the aggregated neighbourhood \(\mathcal{N}_{t}(i)\) of i as the set of nodes i has interacted with in previous time steps. When a node i activates at time t, it generates a hyperedge of size m, drawn from the distribution \(\Psi _{t}(m)\):

  • with probability \(p_{t}(i)=1/(1+l_{t}(i))\), the \(m-1\) nodes i will interact with are selected among nodes that i has not yet encountered, i.e. who do not belong to its neighbourhood \(\mathcal{N}_{t}(i)\) at time t, where \(l_{t}(i)=|\mathcal{N}_{t}(i)|\). In this case each node \(j \notin \mathcal{N}_{t}(i)\) is selected with probability \(b(j)/ \sum _{k \notin \mathcal{N}_{t}(i)} b(k)\);

  • with probability \((1-p_{t}(i))\), they are selected among nodes that i has already met, i.e. who belongs to its neighbourhood \(\mathcal{N}_{t}(i)\) at time t. In this case each node \(j \in \mathcal{N}_{t}(i)\) is contacted with probability \(\omega _{ij}^{t} b(j) / \sum _{k \in \mathcal{N}_{t}(i)} \omega _{ik}^{t} b(k)\), where \(\omega _{ij}^{t}\) is the number of times that i and j have participated together in a hyperedge up to time t.

At the following time-step all the hyperedges are erased, the process continues iteratively and correlations are generated over time by the memory. For simplicity, hereafter we use \(b_{t}(i)= a_{t}(i)\) i, t [63, 64].

In the HADAM model, we cannot determine the activity of the nodes in order to reproduce \(n_{t}(i)\) as observed in the empirical data, since \(n_{t}(i)\) depends on the full detailed history of contacts of i up to time t. We fix the activities as in the HADA model, with Eq. (9), and we have checked that this ansatz reproduces well \(n_{t}^{tot}\) and the average total degree in the aggregate snapshots. We set the time-step Δt equal to the duration of the interactions in the empirical data set.

The model takes into account the hyperedge size distribution and the activity of each node, together with their temporal evolution. Initially, the hypergraph evolves as the HADA model since \(p(i) \sim 1\) for all nodes. Then \(p(i)\) decreases and memory effects become relevant: at first an active node generates hyperedges with both new and old contacts, and then preferentially with only nodes already met, selecting those contacted several times in the past. This memory-attractiveness mechanism favors dense interactions between groups of nodes with high activity and between groups of nodes that contact each other several times, thus generating a rich topological structure.

Availability of data and materials

The data that support the findings of this study are publicly available. The APS data set can be requested at https://journals.aps.org/datasets [44]; the SocioPattern data sets are available at http://www.sociopatterns.org/ [4652]; the Contacts among Utah’s School-age Population data set at https://royalsocietypublishing.org/doi/suppl/10.1098/rsif.2015.0279 [53]; the email communications data set at https://www.cs.cornell.edu/~arb/data/ [5557]; the Copenhagen Network Study data set at https://doi.org/10.6084/m9.figshare.7267433 [54].

Notes

  1. We consider only interactions of size \(m \geq 2\) and neglect the presence of singletons, i.e. hyperedges of size \(m=1\), since here we focus on the characterization of how the elements of the system interact with each other. Moreover, the singletons are immediately pruned in the hyper-core decomposition.

  2. It is possible to define a whole family of hypercoreness centralities [26] by arbitrarily weighing the different hyperedge sizes m in Eq. (1). Here we consider the simplest “size-independent” hypercoreness in which all sizes contribute equally.

  3. The maximum similarity \(\Sigma =1\) is obtained when the two hyper-cores filling profiles are identical \(a_{(k,m)}=b_{(k,m)} \, \forall k \in [1,\overline{K}]\), \(\forall m \in [2,\overline{M}]\); the minimum similarity \(\Sigma =0\) is obtained when the hypergraphs feature the two maximally different configurations, \(a_{(1,2)}=1\), \(a_{(k,m)}=0\) otherwise, and \(b_{(k,m)}=1 \, \forall k \in [1,\overline{K}]\), \(\forall m \in [2, \overline{M}]\), i.e. in one case the \((1,2)\)-core contains the entire population while all the other hyper-cores are empty, and in the other case all the hyper-cores are maximally filled with the entire population.

  4. This measure can be applied to any couple of hypergraphs with different populations, numbers of hyperedges, distributions of hyperdegrees \(P(D_{m}^{\mathcal{H}})\) \(\forall m \in [2,M]\) and distributions of interactions size \(\Psi (m)\). In general, systems with similar \(P(D_{m}^{\mathcal{H}})\) and \(\Psi (m)\) feature a higher similarity compared to those with different distributions.

  5. Note that different similarity measures could be considered to build such similarity matrix.

  6. In the aggregation procedure to create each snapshot, some hyperedges can fully overlap (i.e., the same group of authors can publish more than one article). Although we do not consider weighted hyperedges, in such a case we assign a multiple label composed of the set of journals in which these articles with the same co-authors were published.

Abbreviations

APS:

American Physical Society

SM:

Supplementary Material

AD:

Activity-driven

SAD:

Simplicial activity-driven model

HAD:

Higher-order activity-driven model

HADA:

Higher-order activity-driven model with attractiveness

HADAM:

Higher-order activity-driven model with memory

References

  1. Newman M (2018) Networks. Oxford University Press, London

    Book  Google Scholar 

  2. Dorogovtsev SN, Mendes JFF (2003) Evolution of networks: from biological nets to the Internet and WWW. Oxford University Press, London

    Book  Google Scholar 

  3. Barrat A, Barthélemy M, Vespignani A (2008) Dynamical processes on complex networks. Cambridge University Press, Cambridge

    Book  Google Scholar 

  4. Pastor-Satorras R, Castellano C, Van Mieghem P, Vespignani A (2015) Epidemic processes in complex networks. Rev Mod Phys 87:925

    Article  MathSciNet  Google Scholar 

  5. Holme P, Saramäki J (2012) Temporal networks. Phys Rep 519:97–125

    Article  Google Scholar 

  6. Masuda N, Lambiotte R (2016) A guide to temporal networks. World Scientific, Singapore

    Book  Google Scholar 

  7. Braha D, Bar-Yam Y (2009) Time-dependent complex networks: dynamic centrality, dynamic motifs, and cycles of social interactions. In: Gross T, Sayama H (eds) Adaptive networks: theory, models and applications. Springer, Berlin, pp 39–50

    Chapter  Google Scholar 

  8. Karsai M, Jo H-H, Kaski K (2018) Bursty human dynamics. Springer, Berlin

    Book  Google Scholar 

  9. Petri G, Barrat A (2018) Simplicial activity driven model. Phys Rev Lett 121:228301

    Article  Google Scholar 

  10. Pedreschi N et al. (2020) Dynamic core-periphery structure of information sharing networks in entorhinal cortex and hippocampus. Netw Neurosci 4:946–975

    Article  Google Scholar 

  11. Ciaperoni M et al. (2020) Relevance of temporal cores for epidemic spread in temporal networks. Sci Rep 10:12529

    Article  Google Scholar 

  12. Galimberti E, Barrat A, Bonchi F, Cattuto C, Gullo F (2018) Mining (maximal) span-cores from temporal networks. In: CIKM ’18: proceedings of the 27th ACM international conference on information and knowledge management, pp 107–116

    Google Scholar 

  13. Perra N, Gonçalves B, Pastor-Satorras R, Vespignani A (2012) Activity driven modeling of time varying networks. Sci Rep 2:469

    Article  Google Scholar 

  14. Mancastroppa M, Vezzani A, Muñoz MA, Burioni R (2019) Burstiness in activity-driven networks and the epidemic threshold. J Stat Mech Theory Exp 2019:053502

    Article  MathSciNet  Google Scholar 

  15. Battiston F et al. (2020) Networks beyond pairwise interactions: structure and dynamics. Phys Rep 874:1–92

    Article  MathSciNet  Google Scholar 

  16. Battiston F et al. (2021) The physics of higher-order interactions in complex systems. Nat Phys 17:1093–1098

    Article  Google Scholar 

  17. Danon L, Read JM, House TA, Vernon MC, Keeling MJ (2013) Social encounter networks: characterizing Great Britain. Proc - Royal Soc B 280:20131037

    Article  Google Scholar 

  18. Milojević S (2014) Principles of scientific research team formation and evolution. Proc Natl Acad Sci USA 111:3984–3989

    Article  Google Scholar 

  19. Mayfield MM, Stouffer DB (2017) Higher-order interactions capture unexplained complexity in diverse communities. Nat Ecol Evol 1:0062

    Article  Google Scholar 

  20. Iacopini I, Petri G, Barrat A, Latora V (2019) Simplicial models of social contagion. Nat Commun 10:2485

    Article  Google Scholar 

  21. Majhi S, Perc M, Ghosh D (2022) Dynamics on higher-order networks: a review. J R Soc Interface 19:20220043

    Article  Google Scholar 

  22. Cencetti G, Contreras DA, Mancastroppa M, Barrat A (2023) Distinguishing simple and complex contagion processes on networks. Phys Rev Lett 130:247401

    Article  Google Scholar 

  23. Iacopini I, Petri G, Baronchelli A, Barrat A (2022) Group interactions modulate critical mass dynamics in social convention. Commun Phys 5:64

    Article  Google Scholar 

  24. Kovalenko K et al. (2022) Vector centrality in hypergraphs. Chaos Solitons Fractals 162:112397

    Article  MathSciNet  Google Scholar 

  25. Contisciani M, Battiston F, De Bacco C (2022) Inference of hyperedges and overlapping communities in hypergraphs. Nat Commun 13:7229

    Article  Google Scholar 

  26. Mancastroppa M, Iacopini I, Petri G, Barrat A (2023) Hyper-cores promote localization and efficient seeding in higher-order processes. Nat Commun 14:6223

    Article  Google Scholar 

  27. Bianconi G, Dorogovtsev SN (2024) Nature of hypergraph k-core percolation problems. Phys Rev E 109:014307

    Article  MathSciNet  Google Scholar 

  28. Kirkley A (2024) Inference of dynamic hypergraph representations in temporal interaction data. Phys Rev E 109:054306

    Article  MathSciNet  Google Scholar 

  29. Sekara V, Stopczynski A, Lehmann S (2016) Fundamental structures of dynamic social networks. Proc Natl Acad Sci USA 113:9977–9982

    Article  Google Scholar 

  30. Chowdhary S, Kumar A, Cencetti G, Iacopini I, Battiston F (2021) Simplicial contagion in temporal higher-order networks. J Phys Complex 2:035019

    Article  Google Scholar 

  31. Neuhäuser L, Lambiotte R, Schaub MT (2021) Consensus dynamics on temporal hypergraphs. Phys Rev E 104:064305

    Article  MathSciNet  Google Scholar 

  32. Ceria A, Wang H (2023) Temporal-topological properties of higher-order evolving networks. Sci Rep 13:5885

    Article  Google Scholar 

  33. Cencetti G, Battiston F, Lepri B, Karsai M (2021) Temporal properties of higher-order interactions in social networks. Sci Rep 11:7028

    Article  Google Scholar 

  34. Yao Q, Chen B, Evans TS, Christensen K (2021) Higher-order temporal network effects through triplet evolution. Sci Rep 11:15419

    Article  Google Scholar 

  35. Gallo L, Lacasa L, Latora V, Battiston F (2024) Higher-order correlations reveal complex memory in temporal hypergraphs. Nat Commun 15:4754

    Article  Google Scholar 

  36. Iacopini I, Karsai M, Barrat A (2023) The temporal dynamics of group interactions in higher-order social networks. arXiv:2306.09967

  37. Di Gaetano L, Battiston F, Starnini M (2024) Percolation and topological properties of temporal higher-order networks. Phys Rev Lett 132:037401

    Article  MathSciNet  Google Scholar 

  38. Guo J-L, Zhu X-Y, Suo Q, Forrest J (2016) Non-uniform evolving hypergraphs and weighted evolving hypergraphs. Sci Rep 6:36648

    Article  Google Scholar 

  39. Mancastroppa M, Guizzo A, Castellano C, Vezzani A, Burioni R (2022) Sideward contact tracing and the control of epidemics in large gatherings. J R Soc Interface 19:20220048

    Article  Google Scholar 

  40. Le Bail D, Génois M, Barrat A (2023) Modeling framework unifying contact and social networks. Phys Rev E 107:024301

    Article  MathSciNet  Google Scholar 

  41. Masuda N, Holme P (2019) Detecting sequences of system states in temporal networks. Sci Rep 9:795

    Article  Google Scholar 

  42. Sugishita K, Masuda N (2021) Recurrence in the evolution of air transport networks. Sci Rep 11:5514

    Article  Google Scholar 

  43. Braha D, Bar-Yam Y (2006) From centrality to temporary fame: dynamic centrality in complex networks. Complexity 12:59–63

    Article  Google Scholar 

  44. (2022) APS data sets for research. https://journals.aps.org/datasets. Accessed: 2023-09-11

  45. (2023) APS physical review journals https://journals.aps.org/. Accessed: 2023-09-11

  46. Sociopatterns collaboration (2008). http://www.sociopatterns.org/. Accessed: 2023-07-01

  47. Génois M et al. (2015) Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers. Netw Sci 3:326–347

    Article  Google Scholar 

  48. Génois M, Barrat A (2018) Can co-location be used as a proxy for face-to-face contacts? EPJ Data Sci 7:11

    Article  Google Scholar 

  49. Isella L et al. (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271:166–180

    Article  MathSciNet  Google Scholar 

  50. Vanhems P et al. (2013) Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PLoS ONE 8:e73970

    Article  Google Scholar 

  51. Stehlé J et al. (2011) High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6:e23176

    Article  Google Scholar 

  52. Mastrandrea R, Fournet J, Barrat A (2015) Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS ONE 10:e0136497

    Article  Google Scholar 

  53. Toth DJA et al. (2015) The role of heterogeneity in contact timing and duration in network models of influenza spread in schools. J R Soc Interface 12:20150279

    Article  Google Scholar 

  54. Sapiezynski P, Stopczynski A, Lassen DD, Lehmann S (2019) Interaction data from the Copenhagen networks study. Sci Data 6:315

    Article  Google Scholar 

  55. Paranjape A, Benson AR, Leskovec J (2017) Motifs in temporal networks. In: WSDM ’17: proceedings of the tenth ACM international conference on web search and data mining, pp 601–610

    Chapter  Google Scholar 

  56. Benson AR, Abebe R, Schaub MT, Jadbabaie A, Kleinberg J (2018) Simplicial closure and higher-order link prediction. Proc Natl Acad Sci USA 115:E11221–E11230

    Article  Google Scholar 

  57. (2022) Austin R. Benson datasets. https://www.cs.cornell.edu/~arb/data/. Accessed: 2022-12-11

  58. Pais A (1988) Inward bound: of matter and forces in the physical world. Oxford University Press, London

    Google Scholar 

  59. Abe F et al. (1995) Observation of top quark production in p̅p collisions with the collider detector at fermilab. Phys Rev Lett 74:2626

    Article  Google Scholar 

  60. Abachi S et al. (1995) Observation of the top quark. Phys Rev Lett 74:2632

    Article  Google Scholar 

  61. Karsai M, Perra N, Vespignani A (2014) Time varying networks and the weakness of strong ties. Sci Rep 4:4001

    Article  Google Scholar 

  62. Ubaldi E et al. (2016) Asymptotic theory of time-varying social networks with heterogeneous activity and tie allocation. Sci Rep 6:35724

    Article  Google Scholar 

  63. Alessandretti L, Sun K, Baronchelli A, Perra N (2017) Random walks on activity-driven networks with attractiveness. Phys Rev E 95:052318

    Article  Google Scholar 

  64. Pozzana I, Sun K, Perra N (2017) Epidemic spreading on activity-driven networks with attractiveness. Phys Rev E 96:042310

    Article  Google Scholar 

  65. Tomasello MV, Vaccario G, Schweitzer F (2017) Data-driven modeling of collaboration networks: a cross-domain analysis. EPJ Data Sci 6:22

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

MM and AB acknowledge support from the Agence Nationale de la Recherche (ANR) project DATAREDUX (ANR-19-CE46-0008).

Author information

Authors and Affiliations

Authors

Contributions

MM, II, GP, AB designed the study; MM performed the analysis; MM, II, GP, AB analyzed the results; MM and AB wrote the first draft; MM, II, GP, AB contributed to the current draft. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Marco Mancastroppa.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material, SM. (PDF 12.0 MB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mancastroppa, M., Iacopini, I., Petri, G. et al. The structural evolution of temporal hypergraphs through the lens of hyper-cores. EPJ Data Sci. 13, 50 (2024). https://doi.org/10.1140/epjds/s13688-024-00490-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjds/s13688-024-00490-1

Keywords