 Research
 Open access
 Published:
The simpliciality of higherorder networks
EPJ Data Science volume 13, Article number: 17 (2024)
Abstract
Higherorder networks are widely used to describe complex systems in which interactions can involve more than two entities at once. In this paper, we focus on inclusion within higherorder networks, referring to situations where specific entities participate in an interaction, and subsets of those entities also interact with each other. Traditional modeling approaches to higherorder networks tend to either not consider inclusion at all (e.g., hypergraph models) or explicitly assume perfect and complete inclusion (e.g., simplicial complex models). To allow for a more nuanced assessment of inclusion in higherorder networks, we introduce the concept of “simpliciality” and several corresponding measures. Contrary to current modeling practice, we show that empirically observed systems rarely lie at either end of the simpliciality spectrum. In addition, we show that generative models fitted to these datasets struggle to capture their inclusion structure. These findings suggest new modeling directions for the field of higherorder network science.
1 Introduction
A wide range of complex systems are shaped by interactions involving several entities at once: social networks are driven by group behavior [1], emails often have multiple recipients [2–4], molecular pathways in cells involve multiprotein interactions [5], and scientific articles involve groups of coauthors [6]. Higherorder networks are a natural extension to networks explicitly designed to model such multiway relationships [7].
Two mathematical representations are most commonly used to model higherorder networks: hypergraphs and simplicial complexes [8]. A hypergraph is a collection of entities (nodes) connected by interactions (hyperedges) between any number of these entities. A simplicial complex can be considered a hypergraph with an additional requirement known as downward closure, which states that when an interaction exists between m entities, every possible subinteraction also exists. This mathematical construction originates in algebraic topology and is motivated by theoretical applications; for example, forming operators such as boundary matrices to identify cycles in a dataset or the Hodge Laplacian to describe dynamical processes in higherorder networks [9, 10].
Recent work has grappled with the problem and consequences of choosing the proper representation—simplicial complex, hypergraph, or other—for a given complex system. Ref. [11], for instance, shows that synchronization can differ drastically in systems modeled with simplicial complexes and hypergraphs due to synchrony driven by the included edges of simplicial complexes, and three recent studies investigate the impact of inclusions on contagion [12–14]. Additionally, Ref. [8] discusses how each representation corresponds to different modeling assumptions and, thus, different analysis pipelines.
Missing from these studies are analyses of the suitability of higherorder representations for describing empirically observed interactions. When a set of interactions is given to a data scientist or modeler, the choice of representation is essentially empirical. Do the data satisfy downward closure? (In which case, a simplicial complex may best represent it.) Or do the data violate downward closure? (In which case, it should be modeled as a hypergraph.)
In this paper, we introduce the concept of simpliciality to describe the extent to which a set of interactions satisfies the downward closure requirement. We implement this concept with three measures of the overall simpliciality of a dataset and describe how to define local versions of these global metrics. Using these measures, we investigate the simpliciality of empirical datasets and show that commonly analyzed higherorder interaction datasets populate the full spectrum of simpliciality. We find that there may be large variations in the simpliciality depending on the chosen empirical dataset and the measure of simpliciality. Additionally, we show that the level of simpliciality displayed by existing models is typically not captured by existing generative models for higherorder networks. Hence, this paper identifies an essential gap in the current set of higherorder measures and models.
These new simpliciality measures complement other higherorder structural measures such as community structure [15–17], centrality [18–20], clustering [21, 22], assortativity [23, 24], and degree heterogeneity [25]. While these measures are helpful in understanding how higherorder data is organized, they do not address how hypergraphs relate to simplicial complexes. Closest to our work is the recent Ref. [13], in which the authors describe the encapsulation graph, a structural description of the inclusion patterns of any given dataset, as well as a dynamical process based on inclusion that spreads from included hyperedges to larger containing hyperedges. Also relevant is Ref. [26], which defines a global metric of inclusions. There is a large body of literature surrounding the concept of nestedness [27], which measures the inclusion structure of unipartite and bipartite networks, particularly in ecological contexts [28]. Measures of nestedness, however, do not use simplicial complexes as a reference point against which to compare. In contrast, our approach describes downward inclusions succinctly with simple global measures, offering a new perspective on how higherorder data is organized.
1.1 Mathematical definitions
We encode interactions as hypergraphs, defined as a pair \(\mathbf{H}= (V, E)\) where V is a set of \(N=V\) entities known as nodes, and where E is a set of subsets of V encoding relationship between nodes. We refer to a set \(e\in E\) as a “hyperedge” or just “edge,” and define the size of an edge as \(e\). In general, E could be a multiset of multisets, but in this study, we solely consider simple hypergraphs, where each edge is only present once (no multihyperedges), and each edge can only contain unique entities (no selfrelations).
Our analysis will focus on the prevalence of inclusions in interaction data, so we describe this relationship formally. We say that an edge f is included in e, or \(f \subset e\), if every node of f is also a node of e. Drawing from the nomenclature of algebraic topology, we also say that f is a subface of e [29].
Inclusions naturally lead to the concept of a maximal hyperedge, i.e., a hyperedge ẽ that is not included in any other hyperedge. We denote the set of all the maximal hyperedges of H as \(\widetilde{E}(\mathbf{H})\).
A simplex s is then a collection of hyperedges that contains a single maximal hyperedge ẽ and satisfies \(s \equiv \mathcal{P}(\widetilde{e})\), where \(\mathcal{P}(X)\) is the power set of set X. In other words, a simplex is a maximal edge with an associated collection of subfaces in which every possible subface of the maximal edge exists. A collection of simplices is a simplicial complex \(\mathbf{S}=(V, E)\), and we say that a hypergraph where every maximal edge is a simplex satisfies downward closure.
Finally, we will use the notion of an induced simplicial complex, which is the simplicial complex constructed from the maximal edges of a hypergraph by adding all hyperedges needed to satisfy downward closure.
2 Measuring simpliciality
This paper introduces the concept of simpliciality. Simpliciality, broadly defined, measures the inclusion structure of a hypergraph and how similar a higherorder dataset is to the structure of a simplicial complex; see Fig. 1A. There are many ways to measure this, which we outline in Sect. 2.1. Before we get there, however, we must first introduce relevant terminology.
2.1 Measures
This section introduces measures of simpliciality. We follow a few guiding principles to design these measures. One, a simplicial complex must be maximally simplicial with respect to any measure of simpliciality. Two, to facilitate easier comparison between datasets, measures of simpliciality should be normalized so that they map a hypergraph to a value between 0 and 1. Three, if a subface is added to a hypergraph, the simpliciality must increase. Four, as a dataset becomes qualitatively more like a simplicial complex, the simpliciality should increase. And five, we stipulate that the simpliciality of an empty hypergraph is undefined.
There are many ways to define a measure of simpliciality while maintaining these guiding principles. To highlight different structural elements contributing to the inclusion structure, we define three measures: the simplicial fraction, the edit simpliciality, and the mean face simpliciality. These measures are all illustrated in Fig. 1 BD.
Simplicial fraction
In a simplicial complex, every subface is itself a simplex, so when a hypergraph is a simplicial complex, it contains all subsets of each of its hyperedges. The simplicial fraction (SF) measures the degree to which this is true, defined as the fraction of hyperedges which are also simplices.
Formally, we let \(\mathbf{H}= (V, E)\) be a hypergraph and let \(S = \{ e\in E \  \ \mathcal{P}(e)\subseteq E\}\) be the set of hyperedges which are also simplices. Then, the simplicial fraction is defined as
and it takes values in the range \(\sigma _{\mathrm{SF}}\in [0,1]\); see Fig. 1B.
The simplicial fraction directly measures the number of simplices in the dataset and is, therefore, highly interpretable. However, one potential downside is that edges which almost achieve downward closure do not count at all toward the overall simpliciality. Furthermore, this definition weighs smaller simplices heavily, as small simplices contribute to the simpliciality of all hyperedges that include them.
Edit simpliciality
The edit simpliciality (ES) is defined as the minimal number (or fraction, in the normalized case) of additional edges needed to make a hypergraph a simplicial complex.
Our formal definition uses the notion of an induced simplicial complex defined in Sect. 1.1. Given a hypergraph \(\mathbf{H}= (V, E)\) for which we want to measure the ES, we find its maximal edges Ẽ and construct the simplicial complex \(\mathbf{S}= (V, C)\) induced on H, with \(C=\bigcup_{e \in \widetilde{E}} \mathcal{P}(e)\). The edit simpliciality is then
again satisfying \(\sigma _{\mathrm{ES}}\in [0,1]\); see Fig. 1C. (We note that one can use the induced simplicial complex to define variants of the ES, e.g., a simplicial edit distance \(d_{\mathrm{ES}}= C  E\) or a normalized distance \(d_{\mathrm{NES}}= (C  E)/C = 1  E/C = 1  \sigma _{ \mathrm{ES}}\).)
The ES answers a slightly different question than the SF does—it counts missing hyperedges that would make the dataset into a simplicial complex, rather than the edges that already satisfy downward closure. It thus offers a complementary, equally interpretable measure of simpliciality. However, the ES has the disadvantage of being sensitive to outliers, as a handful of large hyperedges with few inclusions will rapidly drive \(\sigma _{\mathrm{ES}}\) towards 0. Indeed, a hyperedge of size m without any inclusion contributes one edge to \(E\) but \(2^{m}\) edges to \(C\) in the denominator of Eq. (2).
Face edit simpliciality
Finally, building upon the idea of edit simpliciality, we define a more localized notion of simpliciality, using the number of subfaces that must be added to the hypergraph to make a particular face a simplex.
Given a hyperedge e, the number of edges one must add to the hypergraph to make e a simplex is
where \(c = \{ f \in E \mid f \subseteq e\}\). We can think of this quantity as an edit distance, or face edit distance. We use this quantity to define an average
where F is a set of edges—most commonly, \(F = \widetilde{E}\) or E. We exclusively use \(F = \widetilde{E}\) in this study. These quantities are on the scale of counts, and to define quantities analogous to previous simpliciality measures, we thus introduce a perface normalization, either on a distance scale (meaning that the quantity grows as the dataset becomes less simplicial):
or, similarly to previous definitions, on a simpliciality scale:
We call this last measure the face edit simpliciality (FES).
The FES normalizes the face edit distance as a fraction of its maximal simpliciality. This normalization removes the dominance of large edges in the calculation of \(\sigma _{\mathrm{ES}}\) and, in fact, exponentially downweights the contribution of these edges. In addition, because this metric is computed on faces, this is an averaged local metric.
2.2 Important considerations when measuring simpliciality
Before we turn to applications in Sect. 3, let us discuss three design choices that may impact the conclusion we reach about the simpliciality of a dataset.
First, we note that the formal definition of a simplicial complex can be unnecessarily strict when used to represent perfect inclusion structures. By definition, a simplex always contains singletons (edges comprising a single node) and the empty set. Several datasets will not include such interactions by construction. One example is proximity datasets, where edges encode proximity events in which two or more nodes become in close contact during the observation period. Because of their spatial nature, these datasets are often very dense and contain many inclusions [7]. Yet, according to the standard definition, these will never be simplicial complexes due to the absence of singletons. Another example is email datasets, which also do not contain singletons unless one includes emails that individuals send to themselves. Because we define our notion of inclusion in terms of simplicial complexes, our measures will label these datasets as having no inclusion structure whatsoever.
To circumvent this issue, we use a relaxed definition of downward closure that excludes singletons wherever it makes sense. The relaxation uses the notion of a sizerestricted power set \(\mathcal{P}_{K}(X)\), where K is a set of integers, defined as
For example, given an edge e of size n, \(\mathcal{P}_{\{2,\dots ,n1\}}(e)\) is the set of \(2^{e}  e  2\) subfaces of e excluding the empty set, all singletons (sets of size one), and the edge e itself. Relaxed measures of simpliciality follow by substituting \(\mathcal{P}(X)\) for \(\mathcal{P}_{K}(X)\) in all the measures of Sect. 2.1. Hence, for example, we obtain a relaxation of \(\sigma _{\mathrm{SF}}\) by replacing the definition of S, the set of the hyperedges of H that are also simplices, by \(S = \{e\subseteq E\mid \mathcal{P}_{K}(e)\subseteq E\} \)), where \(K= \{2,\ldots,e \}\).
The results shown in Sect. 3 are all calculated using size restrictions to exclude singletons and the empty set. However, we note that this technique can be used more generally to exclude any interaction sizes deemed unimportant, anomalous, or problematic [30]; or, conversely, to be more strict and to include singletons (say, when analyzing academic coauthorship networks, where singleauthor papers can meaningfully impact the inclusion structure of the dataset).
Second, we observe that special hyperedges we call “minimal faces” may significantly skew the simpliciality of a dataset. The minimal faces of a hypergraph H are the edges of the minimal size, i.e., \(e = \min (K)\), where K is the set of sizes in the sizerestricted powerset (In a traditional simplicial complex, the minimal faces are singletons). With the size restrictions in place, the minimal faces of a hypergraph are always simplices because, by definition, there are no smaller edges for these edges to include. We argue that when measuring the simpliciality of a dataset, it is most meaningful to focus on the faces for which inclusion is possible, and so we exclude these minimal faces when counting potential simplices.
Note that this design choice operates differently from the size restriction imposed by the modified power set introduced in Eq. (4); in that context, we argued for ignoring edges that can prevent other edges from being simplices, while here we suggest that counting minimal faces as potential simplices will strongly affect the value of simpliciality. Our strategy is as follows. For SF, this means that both S and E exclude the minimalsized edges. For ES, we exclude maximal faces that are also minimal faces when constructing the minimal simplicial complex. And for FES, we only average over maximal edges that are not minimal faces.
Third and finally, since the number of potential subfaces of a hyperedge grows exponentially with its size, computational issues prevent us from applying our measures to large hyperedges. For this reason, we select a maximum face size k (we use \(k=11\) throughout), again using the size restriction to define our metrics. This drops information about large hyperedges but speeds up computation drastically in practical applications.
2.3 Local simpliciality
Simpliciality, up to this point a global metric, can also be localized on a smaller subset of the higherorder network to yield information about its local structure. The various facecentric measures used in our construction of the FES provide this information at the level of faces. But for more flexibility, we also use subhypergraphs to define nodal simpliciality measures of our global measures. More specifically, given a hypergraph \(\mathbf{H}= (V,E)\) and a node \(v\in V\), we define the neighborhood of v as \(n(v) = \{u \in V \mid u, v \in e\in E\}\) and the associated subsets \(\widehat{V}=v\,\cup \, n(v)\) and \(\widehat{E} = \{e \in E \mid e\subseteq \widehat{V}\}\). Then the simpliciality of node v is the simpliciality defined on the subhypergraph \(\widehat{\mathbf{H}} =(\widehat{V}, \widehat{E})\) induced on the neighborhood of v. Note that when v is an isolated node or when Ê only contains minimal faces and we do not consider these potential simplices, the nodal simpliciality will be undefined.
3 Results
3.1 Empirical datasets
As the first demonstration of the simpliciality measures, we analyze empirical higherorder datasets from several general domains. All datasets are obtained from the xgidata repository [31] and are openly available. Following the considerations highlighted in Sect. 2.2, we preprocess these datasets to remove singletons, multiedges, and isolated nodes. In addition, for computational feasibility, we only consider hyperedges of size 11 (order, defined as the size minus one, of 10) and smaller. Basic structural properties of the preprocessed datasets are shown in Table 1.
Our sample of datasets contains various types of complex systems. We analyze three proximity datasets [1, 2, 31–33] (contactprimaryschool, contacthighschool, and hospitallyon), which are collected via proximity sensors with a range of roughly 1 meter. Nodes are individuals, and an edge is created from a proximity event, where two individuals are closer than 1 meter apart. To create a higherorder dataset, each maximal clique is converted into a hyperedge at each time step. Unique to proximity datasets are their geometrical constraints, and because of the proximity sensor range, 5hyperedges are the largest edges present in these datasets. We also include two datasets of email interactions [2–4, 34]: emailenron and emaileu. In both cases, the nodes are email addresses, and the hyperedges are emails, at the defunct company Enron in the former case and a large European research institution in the latter. Three datasets are loosely associated with biological processes [2, 35, 36]: diseasome, disgenenet, and ndcsubstances. In these datasets, nodes are compounds, diseases, or genes, while hyperedges are interactions amongst these to represent pharmaceuticals, symptoms, and diseases. Finally, we include two miscellaneous datasets as well: tagsaskubuntu [2] higherorder dataset in which a node is a tag on Stack Overflow, and an edge is a question to which the tags are associated; and the congressbills dataset [2, 37, 38] where nodes are congresspeople and edges represent the sponsoring and cosponsoring congresspeople for a particular bill.
Numerical values of the simpliciality measures are shown in Table 1 for all of these datasets. We find that values for simpliciality fill the spectrum from 0 to 1, depending on the data. The proximity datasets have large simpliciality for all three measures, while the biological datasets have low simpliciality for all three measures. The email datasets have a very small ES simpliciality, with moderate simpliciality for the other two measures. (And since we use size restrictions to exclude singletons, the lack or absence of emails sent to oneself does not affect this assessment.) Similarly, the tagsaskubuntu dataset has a range of simpliciality values depending on which measure we consider. This shows that the measures we have defined in Sect. 2.1 capture different features of the inclusion structure.
While the measures give differing perspectives on the simpliciality of each dataset, we verify that they broadly agree with a correlation analysis. The Pearson correlation coefficient is \(\rho =0.97\) between the SF and ES, \(\rho =0.95\) between the SF and FES, and \(\rho =0.90\) between the ES and FES (all significant at the \(p=0.001\) level). Hence, the values are closely and linearly related in our sample. They also order datasets similarly, from the least to most simplicial, since the Spearman rankorder correlation coefficient is \(\rho =0.89\) between SF and ES, \(\rho =0.997\) between SF and FES, and \(\rho =0.90\) between ES and FES (all significant at the same level).
Although our correlation analysis confirms that these measures roughly capture the same concepts, the datasets where they depart from one another highlight their key differences. In our experiments, these differences are due to features such as large edges with few included edges, many edges that are mostly closed downward, and different edge size distributions. Networks of organizational email messages are examples of the first case, where very large organization or departmentwide emails may be sent with no guarantee that emails are also sent between every possible subgroup of individuals. In this case, we would expect ES to be extremely low while the SF need not be low. Proximity networks are examples of the second case, i.e., dense downwardclosed datasets. We see this by noting that the SF is not 1, while both ES and FES are close to 1 due to the SF penalizing almostsimplicial edges. Lastly, the edge size distribution has strong implications on all measures; for the same average edge size and number of edges, increasing both the number of small and large edges will affect the SF and ES measures. For ES, the large edges will exponentially increase the number of subfaces needed to create a simplicial complex, driving the simpliciality to zero. In contrast, increasing the number of small edges can create more small simplices, increasing SF.
3.2 Generative models of higherorder networks
To complement our analysis of empirical data, we also examine the simpliciality of synthetic data generated with generative models for higherorder networks.
We focus on models of hypergraphs designed to describe and analyze arbitrary higherorder structures. There are several random hypergraph models, including, among many classes of models, preferential attachment models [39–42], models with community structure [15, 43–46], models with specified degree and size sequences [23, 43], ErdösRényi models [47, 48], models with latent nodal variables governing edge formation [49], and geometric models [42, 50, 51]. Higherorder random models that are commonly fit to empirical datasets include the configuration model [23], the bipartite ChungLu model [23, 43], and the bipartite degreecorrected stochastic block model [52]. See Ref. [7] for an extensive overview. Overwhelmingly, generative hypergraph models lack explicit control over the inclusion structure of hyperedges, so there are often relatively few simplices.
We focus our analysis on three models: the configuration model [23], the bipartite ChungLu model [43], and the bipartite degreecorrected stochastic block model (biSBM) [52].
We fit each model to the empirical datasets of Table 1, use the fitted models to generate a distribution of higherorder networks (the posterior predictive distribution in Bayesian parlance), and analyze the resulting distribution of simpliciality values.
In all cases, when sampling synthetic higherorder networks from the three generative models, we generate 10^{3} realizations of each model for each empirical dataset. We use the double edgeswap algorithm presented in Ref. [23] to sample from the configuration model and performed \(10 \times E\) edge swaps, roughly in accordance with [53]. For the bipartite ChungLu model [43], we extract the degree and edge size sequences and then use a bipartite variation of the algorithm introduced in Ref. [54] and available in XGI [55] to sample from this model. Lastly, when sampling from the biSBM, we used a Markov chain Monte Carlo method with a bipartite prior using the algorithm described in Ref. [52].
All results are reported in Fig. 2. Overwhelmingly, we see that the generative models cannot accurately capture the simpliciality of datasets when they have a nontrivial inclusion structure. While it does not reproduce the correct values, the hypergraph configuration model consistently captures the inclusion structure better than the biSBM and the bipartite ChungLu model, irrespective of the simpliciality measure used. This may be due to the exact specification of the degree and edge size sequences; the ChungLu model and biSBM only match these sequences in expectation.
3.3 Local measures of simpliciality
As a final demonstration, we apply our local measures of simpliciality to the dataset of emails sent by Enron employees (142 nodes and 1126 hyperedges, filtered to include interactions of sizes 2 and 3). Results are shown in Fig. 3.
Focusing on the histograms first, we find that the SF has the most variability and that the FES covers a similarly large range. In contrast, the ES tells us that nearly every neighborhood is strongly simplicial. This is expected behavior because the ES relies on a simplicial complex induced on the egohypergraph; the size of the largest hyperedges in this egohypergraph can be substantially smaller than that of the largest hyperedges in the whole hypergraph. As a result, the denominator of Eq. (2) is reduced, increasing the local ES systematically. In contrast, when we take a subset of nodes to form an egohypergraph, it is easy to omit a small subface shared by many hyperedges, thus leading to a very small SF (and, similarly, to a small FES).
Turning to the spatial distribution of simpliciality shown in the insets, we see that the SF and FES find a region of high simpliciality at the network’s core with regions of low simpliciality on its edges. In fact, these two measures are largely in agreement, with a Pearson correlation coefficient of \(\rho =0.84\) between the local SF and FES. (The correlation drops to \(\rho =0.69\) when comparing the SF and ES). We also observe several nodes for which simpliciality is undefined. These nodes are only connected via minimal faces in their egohypergraphs, and these faces are excluded when calculating both potential and actual simplices.
Finally, inspecting Fig. 3, we notice that, in this case, nodes of similar simpliciality tend to be connected to one another. To quantify this observation, we define the simplicial assortativity as the Pearson correlation coefficient of the simpliciality of pairs of nodes connected by at least one hyperedge. More formally, we use the unweighted adjacency matrix of the hypergraph, A, defined as
where B is the incidence matrix of the hypergraph, such that \(B_{ij}=1\) if node edge j is incident on node i. The simplicial assortativity, ρ, can then be defined as
where \(\sigma _{i}\) is the local simpliciality of node i according to one of our measures. The simplicial assortativity for SF, ES, and FES are denoted \(\rho _{\mathrm{SF}}\), \(\rho _{\mathrm{ES}}\), and \(\rho _{\mathrm{FES}}\) respectively. This coefficient is equivalent to the assortativity coefficient [56] of the local simpliciality on the unweighted pairwise projection of the hypergraph.
One should expect local simpliciality to be assortative as any given subface contributes to the simpliciality of all their nodes. Table 2 shows that the situation is a bit more nuanced. For tagsaskubuntu, FES is weakly assortative, whereas the other two measures are weakly disassortative. It is particularly striking that despite the hospitallyon dataset being highly simplicial (as seen in Fig. 2), it is also weakly disassortative.
4 Conclusion
In this paper, we have introduced measures to summarize the inclusion structure—the simpliciality—of hypergraphs. We have presented three measures of simpliciality but recognize that other definitions of simpliciality may also prove useful. We have discussed how the simpliciality of higherorder datasets depends on many factors, including, but not limited to, the manner in which the dataset was collected, its domain, and the measure of simpliciality. When fitting common generative models to several empirical higherorder networks, we found that the simpliciality of the original dataset is often much higher than the simpliciality of the posterior predictive distribution of fitted models by any measure of simpliciality. Measuring the simplicial assortativity indicates that the simpliciality displays different levels of localization.
We hope this study will serve as a starting point for network scientists aiming to characterize higherorder network datasets and look forward to future work developing these methods along a number of dimensions of interest.
First, we presented global and nodelevel definitions of simpliciality, but other scales of interaction may yield further insights into the inclusion structure of the data [27]. Future work could explore mesoscale measures of simpliciality that describe how, for example, simpliciality varies between communities. One could also obtain the largest simplicial component or the set of simplicial components in a hypergraph. In addition, we have restricted ourselves to unweighted simplicial complexes, but one might consider extending these notions to weighted simplicial complexes [57].
Second, our approach complements the existing literature on nestedness in bipartite networks [27], which shows that nestedness exists for a wide variety of unipartite and bipartite networks [58]. Existing work shows that nestedness is important for the function of networks in many domains [28, 59–61], and comparing these findings from the perspective of simpliciality could offer additional insights from both a structural and mechanistic perspective.
Finally, we have shown a disparity between the simpliciality of artificial datasets and observed ones. Our findings should thus inform new higherorder network models that specify the inclusion structure of the network and can be fit to empirical higherorder datasets.
Abbreviations
 SF:

Simplicial fraction
 ES:

Edit simpliciality
 FES:

Face edit simpliciality
 biSBM:

bipartite degreecorrected stochastic block model
References
Mastrandrea R, Fournet J, Barrat A (2015) Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS ONE 10(9):0136497. https://doi.org/10.1371/journal.pone.0136497
Benson AR, Abebe R, Schaub MT, Jadbabaie A, Kleinberg J (2018) Simplicial closure and higherorder link prediction. Proc Natl Acad Sci 115(48):11221–11230. https://doi.org/10.1073/pnas.1800683115
Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Machine learning: ECML 2004. Lecture notes in computer science. Springer, Berlin, pp 217–226. https://doi.org/10.1007/9783540301158_22
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):2. https://doi.org/10.1145/1217299.1217301
Murgas KA, Saucan E, Sandhu R (2022) Hypergraph geometry reflects higherorder dynamics in protein interaction networks. Sci Rep 12(1):20879. https://doi.org/10.1038/s4159802224584w
Patania A, Petri G, Vaccarino F (2017) The shape of collaborations. EPJ Data Sci 6(1):18. https://doi.org/10.1140/epjds/s1368801701148
Battiston F, Cencetti G, Iacopini I, Latora V, Lucas M, Patania A, Young JG, Petri G (2020) Networks beyond pairwise interactions: structure and dynamics. Phys Rep 874:1–92. https://doi.org/10.1016/j.physrep.2020.05.004
Torres L, Blevins AS, Bassett D, EliassiRad T (2021) The why, how, and when of representations for complex systems. SIAM Rev 63(3):435–485. https://doi.org/10.1137/20M1355896
Eckmann B (1944) Harmonische funktionen und randwertaufgaben in einem komplex. Comment Math Helv 17(1):240–255. https://doi.org/10.1007/BF02566245
Bianconi G (2021) Higherorder networks. Elements in the structure and dynamics of complex networks. https://doi.org/10.1017/9781108770996
Zhang Y, Lucas M, Battiston F (2023) Higherorder interactions shape collective dynamics differently in hypergraphs and simplicial complexes. Nat Commun 14(1):1605. https://doi.org/10.1038/s41467023371909
Kim J, Lee DS, Goh KI (2023) Contagion dynamics on hypergraphs with nested hyperedges. Phys Rev E 108(3):034313. https://doi.org/10.1103/PhysRevE.108.034313
LaRock T, Lambiotte R (2023) Encapsulation structure and dynamics in hypergraphs. J Phys Complex 4(4):045007. https://doi.org/10.1088/2632072X/ad0b39
Burgio G, Gómez S, Arenas A (2024) Triadic approximation reveals the role of interaction overlap on the spread of complex contagions on higherorder networks. Phys Rev Lett 132(7):077401. https://doi.org/10.1103/PhysRevLett.132.077401
Chodrow PS, Veldt N, Benson AR (2021) Generative hypergraph clustering: from blockmodels to modularity. Sci Adv 7:eabh1303. https://doi.org/10.1126/sciadv.abh1303
Zhou D, Huang J, Schölkopf B (2006) Learning with hypergraphs: clustering, classification, and embedding. Advances in neural information processing systems, vol 19. MIT Press, Cambridge. https://proceedings.neurips.cc/paper/2006/hash/dff8e9c2ac33381546d96deea9922999Abstract.html
Kamiński B, Poulin V, Prałat P, Szufel P, Théberge F (2019) Clustering via hypergraph modularity. PLoS ONE 14(11):0224307. https://doi.org/10.1371/journal.pone.0224307
Benson AR (2019) Three hypergraph eigenvector centralities. SIAM J Math Data Sci 1(2):293–312. https://doi.org/10.1137/18M1203031
Feng S, Heath E, Jefferson B, Joslyn C, Kvinge H, Mitchell HD, Praggastis B, Eisfeld AJ, Sims AC, Thackray LB, Fan S, Walters KB, Halfmann PJ, WesthoffSmith D, Tan Q, Menachery VD, Sheahan TP, Cockrell AS, Kocher JF, Stratton KG, Heller NC, Bramer LM, Diamond MS, Baric RS, Waters KM, Kawaoka Y, McDermott JE, Purvine E (2021) Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinform 22(1):287. https://doi.org/10.1186/s12859021041972
Tudisco F, Higham DJ (2021) Node and edge nonlinear eigenvector centrality for hypergraphs. Commun Phys 4(1):1–10. https://doi.org/10.1038/s42005021007042
Gallagher SR, Goldberg DS (2013) Clustering coefficients in protein interaction hypernetworks. In: Proceedings of the international conference on bioinformatics, computational biology and biomedical informatics. BCB’13. Association for Computing Machinery, New York, pp 552–560. https://doi.org/10.1145/2506583.2506635
Klimm F, Deane CM, Reinert G (2021) Hypergraphs for predicting essential genes using multiprotein complex data. J Complex Netw 9:cnaa028. https://doi.org/10.1093/comnet/cnaa028
Chodrow PS (2020) Configuration models of random hypergraphs. J Complex Netw 8:cnaa018. https://doi.org/10.1093/comnet/cnaa018
Landry NW, Restrepo JG (2022) Hypergraph assortativity: a dynamical systems perspective. Chaos, Interdiscip J Nonlinear Sci 32(5):053113. https://doi.org/10.1063/5.0086905
Landry NW, Restrepo JG (2020) The effect of heterogeneity on hypergraph contagion models. Chaos, Interdiscip J Nonlinear Sci 30(10):103117. https://doi.org/10.1063/5.0020034
Joslyn CA, Aksoy SG, Callahan TJ, Hunter LE, Jefferson B, Praggastis B, Purvine E, Tripodi IJ (2021) Hypernetwork science: from multidimensional networks to computational topology. In: Braha D, de Aguiar MAM, Gershenson C, Morales AJ, Kaufman L, Naumova EN, Minai AA, BarYam Y (eds) Unifying themes in complex systems X. Springer proceedings in complexity. Springer, Cham, pp 377–392. https://doi.org/10.1007/9783030673185_25
Mariani MS, Ren ZM, Bascompte J, Tessone CJ (2019) Nestedness in complex networks: observation, emergence, and implications. Phys Rep 813:1–90. https://doi.org/10.1016/j.physrep.2019.04.001
Bastolla U, Fortuna MA, PascualGarcía A, Ferrera A, Luque B, Bascompte J (2009) The architecture of mutualistic networks minimizes competition and increases biodiversity. Nature 458(7241):1018–1020. https://doi.org/10.1038/nature07950
Hatcher A (2001) Algebraic topology, 1st edn. Cambridge University Press, Cambridge
Landry NW, Amburg I, Shi M, Aksoy SG (2024) Filtering higherorder datasets. J Phys Complex 5(1):015006. https://doi.org/10.1088/2632072X/ad253a
Landry N, Torres L, Lucas M, Iacopini I, Petri G, Patania A, Schwarze A (2023) XGIDATA. https://github.com/xgiorg/xgidata
Stehlé J, Voirin N, Barrat A, Cattuto C, Isella L, Pinton JF, Quaggiotto M, den Broeck WV, Régis C, Lina B, Vanhems P (2011) Highresolution measurements of facetoface contact patterns in a primary school. PLoS ONE 6(8):23176. https://doi.org/10.1371/journal.pone.0023176
Vanhems P, Barrat A, Cattuto C, Pinton JF, Khanafer N, Régis C, Kim BA, Comte B (2013) Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PLoS ONE 8(9):73970. https://doi.org/10.1371/journal.pone.0073970
Yin H, Benson AR, Leskovec J, Gleich DF (2017) Local higherorder graph clustering. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’17. Association for Computing Machinery, New York, pp 555–564. https://doi.org/10.1145/3097983.3098069
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL (2007) The human disease network. Proc Natl Acad Sci 104(21):8685–8690. https://doi.org/10.1073/pnas.0701361104
Piñero J, RamírezAnguita JM, SaüchPitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI (2020) The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res 48(D1):845–855. https://doi.org/10.1093/nar/gkz1021
Fowler JH (2006) Connecting the congress: a study of cosponsorship networks. Polit Anal 14(4):456–487. https://doi.org/10.1093/pan/mpl002
Fowler JH (2006) Legislative cosponsorship networks in the US House and Senate. Soc Netw 28(4):454–465. https://doi.org/10.1016/j.socnet.2005.11.003
Wang JW, Rong LL, Deng QH, Zhang JY (2010) Evolving hypernetwork model. Eur Phys J B 77(4):493–498. https://doi.org/10.1140/epjb/e2010002978
Avin C, Lotker Z, Nahum Y, Peleg D (2019) Random preferential attachment hypergraph. In: 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 398–405. https://doi.org/10.1145/3341161.3342867
Do MT, Yoon SE, Hooi B (2020) Structural patterns and generative models of realworld hypergraphs. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. KDD ’20. Association for Computing Machinery, New York, pp 176–186. https://doi.org/10.1145/3394486.3403060
Barthelemy M (2022) Class of models for random hypergraphs. Phys Rev E 106(6):064310. https://doi.org/10.1103/PhysRevE.106.064310
Aksoy SG, Kolda TG, Pinar A (2017) Measuring and modeling bipartite graphs with community structure. J Complex Netw 5(4):581–603. https://doi.org/10.1093/comnet/cnx001
Zhang Q, Tan VYF (2023) Exact recovery in the general hypergraph stochastic block model. IEEE Trans Inf Theory 69(1):453–471. https://doi.org/10.1109/TIT.2022.3205959
Kim C, Bandeira AS, Goemans MX (2018) Stochastic block model for hypergraphs: statistical limits and a semidefinite programming approach. arXiv:1807.02884
Ruggeri N, Contisciani M, Battiston F, De Bacco C (2023) Community detection in large hypergraphs. Sci Adv 9(28):9159. https://doi.org/10.1126/sciadv.adg9159
Dewar M, Healy J, PérezGiménez X, Prałat P, Proos J, Reiniger B, Ternovsky K (2018) Subhypergraphs in nonuniform random hypergraphs. https://doi.org/10.48550/arXiv.1703.07686. arXiv:1703.07686
Iacopini I, Petri G, Barrat A, Latora V (2019) Simplicial models of social contagion. Nat Commun 10(1):2485. https://doi.org/10.1038/s41467019104316
Di Gaetano L, Battiston F, Starnini M (2024) Percolation and topological properties of temporal higherorder networks. Phys Rev Lett 132(3):037401. https://doi.org/10.1103/PhysRevLett.132.037401
Turnbull K, Lunagómez S, Nemeth C, Airoldi E (2023) Latent space modeling of hypergraph data. J Am Stat Assoc. https://doi.org/10.1080/01621459.2023.2270750
Lunagómez S, Mukherjee S, Wolpert RL, Airoldi EM (2017) Geometric representations of random hypergraphs. J Am Stat Assoc 112(517):363–383. https://doi.org/10.1080/01621459.2016.1141686
Yen TC, Larremore DB (2020) Community detection in bipartite networks with stochastic block models. Phys Rev E 102(3):032309. https://doi.org/10.1103/PhysRevE.102.032309
Dutta U, Fosdick BK, Clauset A (2023) Sampling random graphs with specified degree sequences. https://doi.org/10.48550/arXiv.2105.12120. arXiv:2105.12120
Miller JC, Hagberg A (2011) Efficient generation of networks with given expected degrees. In: Frieze A, Horn P, Prałat P (eds) Algorithms and models for the web graph. Lecture notes in computer science. Springer, Berlin, pp 115–126. https://doi.org/10.1007/9783642212864_10
Landry NW, Lucas M, Iacopini I, Petri G, Schwarze A, Patania A, Torres L (2023) XGI: a Python package for higherorder interaction networks. J Open Sour Softw 8(85):5162. https://doi.org/10.21105/joss.05162
Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67(2):026126. https://doi.org/10.1103/PhysRevE.67.026126
Baccini F, Geraci F, Bianconi G (2022) Weighted simplicial complexes and their representation power of higherorder network data and topology. Phys Rev E 106(3):034319. https://doi.org/10.1103/PhysRevE.106.034319
Johnson S, DomínguezGarcía V, Muñoz MA (2013) Factors determining nestedness in complex networks. PLoS ONE 8(9):74025. https://doi.org/10.1371/journal.pone.0074025
Saavedra S, Stouffer DB, Uzzi B, Bascompte J (2011) Strong contributors to network persistence are the most vulnerable to extinction. Nature 478(7368):233–235. https://doi.org/10.1038/nature10433
Kamilar JM, Atkinson QD (2014) Cultural assemblages show nested structure in humans and chimpanzees but not orangutans. Proc Natl Acad Sci 111(1):111–115. https://doi.org/10.1073/pnas.1313318110
Cantor M, Pires MM, Marquitti FMD, Raimundo RLG, SebastiánGonzález E, Coltri PP, Perez SI, Barneche DR, Brandt DYC, Nunes K, DauraJorge FG, Floeter SR, Guimarães Jr PR (2017) Nestedness across biological scales. PLoS ONE 12(2):0171691. https://doi.org/10.1371/journal.pone.0171691
Fredkin E (1960) Trie memory. Commun ACM 3(9):490–499. https://doi.org/10.1145/367390.367400
Landry N (2024) nwlandry/thesimplicialityofhigherordernetworks: v0.1. Zenodo. https://doi.org/10.5281/zenodo.10666508
Acknowledgements
N.W.L. would like to acknowledge the participants of the “Workshop on Modelling and Mining Complex Networks as Hypergraphs” at Toronto Metropolitan University and Tim LaRock for helpful conversations. N.W.L. would also like to thank TzuChi Yen for lending his expertise on the biSBM inference.
Funding
N.W.L. and J.G.Y. acknowledge support from the National Institutes of Health 1P20 GM12549801 Centers of Biomedical Research Excellence Award. N.E. acknowledges the Harris Faculty Fellowship from Grinnell College.
Author information
Authors and Affiliations
Contributions
NWL conceived the project; NWL, JGY, and NE designed the research; NWL and NE performed the research; NWL, JGY, and NE wrote the article; and NWL, JGY, and NE edited the draft. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Convergence of the configuration model MCMC
To sample from the configuration model, we implemented the Markov chain Monte Carlo algorithm proposed in Ref. [23]. At each step, (1) two edges are selected at random, \(e_{1}\) and \(e_{2}\); (2) two nodes are selected at random, \(i \in e_{1}\) and \(j \in e_{2}\); and (3) the memberships of these nodes are swapped to form two new edges, \(\widetilde{e}_{1} = (e_{1} \setminus i) \cup j\) and \(\widetilde{e}_{2} = (e_{2} \setminus j) \cup i\). This operation is accepted if the move does not create loopy hyperedges, i.e., \(e_{1} = \widetilde{e}_{1}\) and \(e_{2} = \widetilde{e}_{2}\). In Ref. [23], the criterion for convergence is left for future work, and Ref. [53] presents criteria for the convergence of the configuration model for pairwise networks. In Fig. 4, we show that the number of doubleedge swaps chosen for our configuration model algorithm (\(10\times E\)) is sufficient for convergence with respect to all measures of simpliciality.
From Fig. 4, we see that the configuration model sampler has roughly achieved the stationary value of simpliciality after roughly 10% of the edge swaps have been completed. Assessing convergence in a statistically robust manner is necessary to ensure uniform sampling from the hypergraph configuration model, but this heuristic is sufficient for our purposes.
Appendix B: Measuring simpliciality efficiently
For all measures, we leverage the trie data structure [62] to efficiently compute the measures described in this paper. The trie structure allows us to efficiently verify whether a subface exists (\(\mathcal{O}(e)\), for an edge e). To be compatible with the trie data structure, when adding an edge to the trie and when searching for an edge in the trie, we first sort the nodes in that edge. Below, we present the algorithms employed in generating all results.
Simplicial fraction
For each edge, if a single subface is absent, we can immediately determine that the edge is not a simplex. This can be very efficient for sparse hypergraphs.
Edit simpliciality
There are two ways to compute the edit simpliciality: the exhaustive method and a more memoryefficient version. The exhaustive method is simpler and more computationally efficient. However, the memory requirements are enormous because it stores every missing subface. The memoryefficient version keeps track of the number of missing subfaces, not the missing subfaces themselves. This leverages the fact that the number of hyperedges intersecting with any given hyperedge is typically much smaller than the number of hyperedges. We store the hypergraph of maximal edges for fast retrieval of the neighbors of a given maximal face, which is needed in Algorithm 3.
Face edit simpliciality
This computation is more straightforward than the ES computation because it is a local measure that requires relatively little memory and allows us to compute the number of missing subfaces and then store the FES as a running average.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Landry, N.W., Young, JG. & Eikmeier, N. The simpliciality of higherorder networks. EPJ Data Sci. 13, 17 (2024). https://doi.org/10.1140/epjds/s13688024004581
Received:
Accepted:
Published:
DOI: https://doi.org/10.1140/epjds/s13688024004581