Skip to main content

The simpliciality of higher-order networks


Higher-order networks are widely used to describe complex systems in which interactions can involve more than two entities at once. In this paper, we focus on inclusion within higher-order networks, referring to situations where specific entities participate in an interaction, and subsets of those entities also interact with each other. Traditional modeling approaches to higher-order networks tend to either not consider inclusion at all (e.g., hypergraph models) or explicitly assume perfect and complete inclusion (e.g., simplicial complex models). To allow for a more nuanced assessment of inclusion in higher-order networks, we introduce the concept of “simpliciality” and several corresponding measures. Contrary to current modeling practice, we show that empirically observed systems rarely lie at either end of the simpliciality spectrum. In addition, we show that generative models fitted to these datasets struggle to capture their inclusion structure. These findings suggest new modeling directions for the field of higher-order network science.

1 Introduction

A wide range of complex systems are shaped by interactions involving several entities at once: social networks are driven by group behavior [1], emails often have multiple recipients [24], molecular pathways in cells involve multi-protein interactions [5], and scientific articles involve groups of co-authors [6]. Higher-order networks are a natural extension to networks explicitly designed to model such multi-way relationships [7].

Two mathematical representations are most commonly used to model higher-order networks: hypergraphs and simplicial complexes [8]. A hypergraph is a collection of entities (nodes) connected by interactions (hyperedges) between any number of these entities. A simplicial complex can be considered a hypergraph with an additional requirement known as downward closure, which states that when an interaction exists between m entities, every possible sub-interaction also exists. This mathematical construction originates in algebraic topology and is motivated by theoretical applications; for example, forming operators such as boundary matrices to identify cycles in a dataset or the Hodge Laplacian to describe dynamical processes in higher-order networks [9, 10].

Recent work has grappled with the problem and consequences of choosing the proper representation—simplicial complex, hypergraph, or other—for a given complex system. Ref. [11], for instance, shows that synchronization can differ drastically in systems modeled with simplicial complexes and hypergraphs due to synchrony driven by the included edges of simplicial complexes, and three recent studies investigate the impact of inclusions on contagion [1214]. Additionally, Ref. [8] discusses how each representation corresponds to different modeling assumptions and, thus, different analysis pipelines.

Missing from these studies are analyses of the suitability of higher-order representations for describing empirically observed interactions. When a set of interactions is given to a data scientist or modeler, the choice of representation is essentially empirical. Do the data satisfy downward closure? (In which case, a simplicial complex may best represent it.) Or do the data violate downward closure? (In which case, it should be modeled as a hypergraph.)

In this paper, we introduce the concept of simpliciality to describe the extent to which a set of interactions satisfies the downward closure requirement. We implement this concept with three measures of the overall simpliciality of a dataset and describe how to define local versions of these global metrics. Using these measures, we investigate the simpliciality of empirical datasets and show that commonly analyzed higher-order interaction datasets populate the full spectrum of simpliciality. We find that there may be large variations in the simpliciality depending on the chosen empirical dataset and the measure of simpliciality. Additionally, we show that the level of simpliciality displayed by existing models is typically not captured by existing generative models for higher-order networks. Hence, this paper identifies an essential gap in the current set of higher-order measures and models.

These new simpliciality measures complement other higher-order structural measures such as community structure [1517], centrality [1820], clustering [21, 22], assortativity [23, 24], and degree heterogeneity [25]. While these measures are helpful in understanding how higher-order data is organized, they do not address how hypergraphs relate to simplicial complexes. Closest to our work is the recent Ref. [13], in which the authors describe the encapsulation graph, a structural description of the inclusion patterns of any given dataset, as well as a dynamical process based on inclusion that spreads from included hyperedges to larger containing hyperedges. Also relevant is Ref. [26], which defines a global metric of inclusions. There is a large body of literature surrounding the concept of nestedness [27], which measures the inclusion structure of unipartite and bipartite networks, particularly in ecological contexts [28]. Measures of nestedness, however, do not use simplicial complexes as a reference point against which to compare. In contrast, our approach describes downward inclusions succinctly with simple global measures, offering a new perspective on how higher-order data is organized.

1.1 Mathematical definitions

We encode interactions as hypergraphs, defined as a pair \(\mathbf{H}= (V, E)\) where V is a set of \(N=|V|\) entities known as nodes, and where E is a set of subsets of V encoding relationship between nodes. We refer to a set \(e\in E\) as a “hyperedge” or just “edge,” and define the size of an edge as \(|e|\). In general, E could be a multiset of multisets, but in this study, we solely consider simple hypergraphs, where each edge is only present once (no multi-hyperedges), and each edge can only contain unique entities (no self-relations).

Our analysis will focus on the prevalence of inclusions in interaction data, so we describe this relationship formally. We say that an edge f is included in e, or \(f \subset e\), if every node of f is also a node of e. Drawing from the nomenclature of algebraic topology, we also say that f is a subface of e [29].

Inclusions naturally lead to the concept of a maximal hyperedge, i.e., a hyperedge that is not included in any other hyperedge. We denote the set of all the maximal hyperedges of H as \(\widetilde{E}(\mathbf{H})\).

A simplex s is then a collection of hyperedges that contains a single maximal hyperedge and satisfies \(s \equiv \mathcal{P}(\widetilde{e})\), where \(\mathcal{P}(X)\) is the power set of set X. In other words, a simplex is a maximal edge with an associated collection of subfaces in which every possible subface of the maximal edge exists. A collection of simplices is a simplicial complex \(\mathbf{S}=(V, E)\), and we say that a hypergraph where every maximal edge is a simplex satisfies downward closure.

Finally, we will use the notion of an induced simplicial complex, which is the simplicial complex constructed from the maximal edges of a hypergraph by adding all hyperedges needed to satisfy downward closure.

2 Measuring simpliciality

This paper introduces the concept of simpliciality. Simpliciality, broadly defined, measures the inclusion structure of a hypergraph and how similar a higher-order dataset is to the structure of a simplicial complex; see Fig. 1A. There are many ways to measure this, which we outline in Sect. 2.1. Before we get there, however, we must first introduce relevant terminology.

Figure 1
figure 1

An illustration of simpliciality. Simpliciality captures the notion of inclusion in higher-order networks. (A) Interaction data are fully simplicial, denoted \(\sigma =1\), when all possible included interactions exist, e.g., an interaction between three nodes appears together with interactions between all three possible pairings of these nodes. Data are minimally simplicial, denoted \(\sigma =0\), when no included interactions are present. (B-D) Three natural measures of simpliciality which place higher-order datasets on the simplicial spectrum between \(\sigma =0\) and \(\sigma =1\), as described in Sect. 2.1

2.1 Measures

This section introduces measures of simpliciality. We follow a few guiding principles to design these measures. One, a simplicial complex must be maximally simplicial with respect to any measure of simpliciality. Two, to facilitate easier comparison between datasets, measures of simpliciality should be normalized so that they map a hypergraph to a value between 0 and 1. Three, if a subface is added to a hypergraph, the simpliciality must increase. Four, as a dataset becomes qualitatively more like a simplicial complex, the simpliciality should increase. And five, we stipulate that the simpliciality of an empty hypergraph is undefined.

There are many ways to define a measure of simpliciality while maintaining these guiding principles. To highlight different structural elements contributing to the inclusion structure, we define three measures: the simplicial fraction, the edit simpliciality, and the mean face simpliciality. These measures are all illustrated in Fig. 1 B-D.

Simplicial fraction

In a simplicial complex, every subface is itself a simplex, so when a hypergraph is a simplicial complex, it contains all subsets of each of its hyperedges. The simplicial fraction (SF) measures the degree to which this is true, defined as the fraction of hyperedges which are also simplices.

Formally, we let \(\mathbf{H}= (V, E)\) be a hypergraph and let \(S = \{ e\in E \ | \ \mathcal{P}(e)\subseteq E\}\) be the set of hyperedges which are also simplices. Then, the simplicial fraction is defined as

$$ \sigma _{\mathrm{SF}}= \frac{|S|}{|E|} $$

and it takes values in the range \(\sigma _{\mathrm{SF}}\in [0,1]\); see Fig. 1B.

The simplicial fraction directly measures the number of simplices in the dataset and is, therefore, highly interpretable. However, one potential downside is that edges which almost achieve downward closure do not count at all toward the overall simpliciality. Furthermore, this definition weighs smaller simplices heavily, as small simplices contribute to the simpliciality of all hyperedges that include them.

Edit simpliciality

The edit simpliciality (ES) is defined as the minimal number (or fraction, in the normalized case) of additional edges needed to make a hypergraph a simplicial complex.

Our formal definition uses the notion of an induced simplicial complex defined in Sect. 1.1. Given a hypergraph \(\mathbf{H}= (V, E)\) for which we want to measure the ES, we find its maximal edges and construct the simplicial complex \(\mathbf{S}= (V, C)\) induced on H, with \(C=\bigcup_{e \in \widetilde{E}} \mathcal{P}(e)\). The edit simpliciality is then

$$ \sigma _{\mathrm{ES}}= \frac{|E|}{|C|}, $$

again satisfying \(\sigma _{\mathrm{ES}}\in [0,1]\); see Fig. 1C. (We note that one can use the induced simplicial complex to define variants of the ES, e.g., a simplicial edit distance \(d_{\mathrm{ES}}= |C| - |E|\) or a normalized distance \(d_{\mathrm{NES}}= (|C| - |E|)/|C| = 1 - |E|/|C| = 1 - \sigma _{ \mathrm{ES}}\).)

The ES answers a slightly different question than the SF does—it counts missing hyperedges that would make the dataset into a simplicial complex, rather than the edges that already satisfy downward closure. It thus offers a complementary, equally interpretable measure of simpliciality. However, the ES has the disadvantage of being sensitive to outliers, as a handful of large hyperedges with few inclusions will rapidly drive \(\sigma _{\mathrm{ES}}\) towards 0. Indeed, a hyperedge of size m without any inclusion contributes one edge to \(|E|\) but \(2^{m}\) edges to \(|C|\) in the denominator of Eq. (2).

Face edit simpliciality

Finally, building upon the idea of edit simpliciality, we define a more localized notion of simpliciality, using the number of subfaces that must be added to the hypergraph to make a particular face a simplex.

Given a hyperedge e, the number of edges one must add to the hypergraph to make e a simplex is

$$ d_{\mathrm{FES}}(e) = |\mathcal{P}(e)| - |c|, $$

where \(c = \{ f \in E \mid f \subseteq e\}\). We can think of this quantity as an edit distance, or face edit distance. We use this quantity to define an average

$$ \bar{d}_{\mathrm{FES}} = \frac{1}{ \vert F \vert }\sum _{e\in F} d_{ \mathrm{FES}}(e), $$

where F is a set of edges—most commonly, \(F = \widetilde{E}\) or E. We exclusively use \(F = \widetilde{E}\) in this study. These quantities are on the scale of counts, and to define quantities analogous to previous simpliciality measures, we thus introduce a per-face normalization, either on a distance scale (meaning that the quantity grows as the dataset becomes less simplicial):

$$ \bar{d}_{\mathrm{NFES}} = \frac{1}{ \vert F \vert }\sum _{e\in F} \frac{d_{\mathrm{FES}}(e)}{|\mathcal{P}(e)|}, $$

or, similarly to previous definitions, on a simpliciality scale:

$$ \sigma _{\mathrm{FES}}= \frac{1}{ \vert F \vert }\sum _{e\in F} \biggl(1 - \frac{d_{\mathrm{FES}}(e)}{\mathcal{P}(e)} \biggr). $$

We call this last measure the face edit simpliciality (FES).

The FES normalizes the face edit distance as a fraction of its maximal simpliciality. This normalization removes the dominance of large edges in the calculation of \(\sigma _{\mathrm{ES}}\) and, in fact, exponentially down-weights the contribution of these edges. In addition, because this metric is computed on faces, this is an averaged local metric.

2.2 Important considerations when measuring simpliciality

Before we turn to applications in Sect. 3, let us discuss three design choices that may impact the conclusion we reach about the simpliciality of a dataset.

First, we note that the formal definition of a simplicial complex can be unnecessarily strict when used to represent perfect inclusion structures. By definition, a simplex always contains singletons (edges comprising a single node) and the empty set. Several datasets will not include such interactions by construction. One example is proximity datasets, where edges encode proximity events in which two or more nodes become in close contact during the observation period. Because of their spatial nature, these datasets are often very dense and contain many inclusions [7]. Yet, according to the standard definition, these will never be simplicial complexes due to the absence of singletons. Another example is email datasets, which also do not contain singletons unless one includes emails that individuals send to themselves. Because we define our notion of inclusion in terms of simplicial complexes, our measures will label these datasets as having no inclusion structure whatsoever.

To circumvent this issue, we use a relaxed definition of downward closure that excludes singletons wherever it makes sense. The relaxation uses the notion of a size-restricted power set \(\mathcal{P}_{K}(X)\), where K is a set of integers, defined as

$$ \mathcal{P}_{K}(X) = \bigl\{ x \in \mathcal{P}(X) \ | \ |x| \in K \bigr\} . $$

For example, given an edge e of size n, \(\mathcal{P}_{\{2,\dots ,n-1\}}(e)\) is the set of \(2^{|e|} - |e| - 2\) subfaces of e excluding the empty set, all singletons (sets of size one), and the edge e itself. Relaxed measures of simpliciality follow by substituting \(\mathcal{P}(X)\) for \(\mathcal{P}_{K}(X)\) in all the measures of Sect. 2.1. Hence, for example, we obtain a relaxation of \(\sigma _{\mathrm{SF}}\) by replacing the definition of S, the set of the hyperedges of H that are also simplices, by \(S = \{e\subseteq E\mid \mathcal{P}_{K}(e)\subseteq E\} \)), where \(K= \{2,\ldots,|e| \}\).

The results shown in Sect. 3 are all calculated using size restrictions to exclude singletons and the empty set. However, we note that this technique can be used more generally to exclude any interaction sizes deemed unimportant, anomalous, or problematic [30]; or, conversely, to be more strict and to include singletons (say, when analyzing academic co-authorship networks, where single-author papers can meaningfully impact the inclusion structure of the dataset).

Second, we observe that special hyperedges we call “minimal faces” may significantly skew the simpliciality of a dataset. The minimal faces of a hypergraph H are the edges of the minimal size, i.e., \(|e| = \min (K)\), where K is the set of sizes in the size-restricted powerset (In a traditional simplicial complex, the minimal faces are singletons). With the size restrictions in place, the minimal faces of a hypergraph are always simplices because, by definition, there are no smaller edges for these edges to include. We argue that when measuring the simpliciality of a dataset, it is most meaningful to focus on the faces for which inclusion is possible, and so we exclude these minimal faces when counting potential simplices.

Note that this design choice operates differently from the size restriction imposed by the modified power set introduced in Eq. (4); in that context, we argued for ignoring edges that can prevent other edges from being simplices, while here we suggest that counting minimal faces as potential simplices will strongly affect the value of simpliciality. Our strategy is as follows. For SF, this means that both S and E exclude the minimal-sized edges. For ES, we exclude maximal faces that are also minimal faces when constructing the minimal simplicial complex. And for FES, we only average over maximal edges that are not minimal faces.

Third and finally, since the number of potential subfaces of a hyperedge grows exponentially with its size, computational issues prevent us from applying our measures to large hyperedges. For this reason, we select a maximum face size k (we use \(k=11\) throughout), again using the size restriction to define our metrics. This drops information about large hyperedges but speeds up computation drastically in practical applications.

2.3 Local simpliciality

Simpliciality, up to this point a global metric, can also be localized on a smaller subset of the higher-order network to yield information about its local structure. The various face-centric measures used in our construction of the FES provide this information at the level of faces. But for more flexibility, we also use subhypergraphs to define nodal simpliciality measures of our global measures. More specifically, given a hypergraph \(\mathbf{H}= (V,E)\) and a node \(v\in V\), we define the neighborhood of v as \(n(v) = \{u \in V \mid u, v \in e\in E\}\) and the associated subsets \(\widehat{V}=v\,\cup \, n(v)\) and \(\widehat{E} = \{e \in E \mid e\subseteq \widehat{V}\}\). Then the simpliciality of node v is the simpliciality defined on the subhypergraph \(\widehat{\mathbf{H}} =(\widehat{V}, \widehat{E})\) induced on the neighborhood of v. Note that when v is an isolated node or when Ê only contains minimal faces and we do not consider these potential simplices, the nodal simpliciality will be undefined.

3 Results

3.1 Empirical datasets

As the first demonstration of the simpliciality measures, we analyze empirical higher-order datasets from several general domains. All datasets are obtained from the xgi-data repository [31] and are openly available. Following the considerations highlighted in Sect. 2.2, we preprocess these datasets to remove singletons, multiedges, and isolated nodes. In addition, for computational feasibility, we only consider hyperedges of size 11 (order, defined as the size minus one, of 10) and smaller. Basic structural properties of the pre-processed datasets are shown in Table 1.

Table 1 Properties of empirical datasets and their simpliciality. \(|V|\), \(|E|\), \(\langle k\rangle \), \(\langle s\rangle \), \(\sigma _{\mathrm{SF}}\), \(\sigma _{\mathrm{ES}}\), and \(\sigma _{\mathrm{FES}}\) denote the number of nodes, the number of hyperedges, the mean degree, the mean edge size, the simplicial fraction (SF), edit simpliciality (ES), and the face edit simpliciality (FES), respectively

Our sample of datasets contains various types of complex systems. We analyze three proximity datasets [1, 2, 3133] (contact-primary-school, contact-high-school, and hospital-lyon), which are collected via proximity sensors with a range of roughly 1 meter. Nodes are individuals, and an edge is created from a proximity event, where two individuals are closer than 1 meter apart. To create a higher-order dataset, each maximal clique is converted into a hyperedge at each time step. Unique to proximity datasets are their geometrical constraints, and because of the proximity sensor range, 5-hyperedges are the largest edges present in these datasets. We also include two datasets of email interactions [24, 34]: email-enron and email-eu. In both cases, the nodes are email addresses, and the hyperedges are emails, at the defunct company Enron in the former case and a large European research institution in the latter. Three datasets are loosely associated with biological processes [2, 35, 36]: diseasome, disgenenet, and ndc-substances. In these datasets, nodes are compounds, diseases, or genes, while hyperedges are interactions amongst these to represent pharmaceuticals, symptoms, and diseases. Finally, we include two miscellaneous datasets as well: tags-ask-ubuntu [2] higher-order dataset in which a node is a tag on Stack Overflow, and an edge is a question to which the tags are associated; and the congress-bills dataset [2, 37, 38] where nodes are congresspeople and edges represent the sponsoring and co-sponsoring congresspeople for a particular bill.

Numerical values of the simpliciality measures are shown in Table 1 for all of these datasets. We find that values for simpliciality fill the spectrum from 0 to 1, depending on the data. The proximity datasets have large simpliciality for all three measures, while the biological datasets have low simpliciality for all three measures. The email datasets have a very small ES simpliciality, with moderate simpliciality for the other two measures. (And since we use size restrictions to exclude singletons, the lack or absence of emails sent to oneself does not affect this assessment.) Similarly, the tags-ask-ubuntu dataset has a range of simpliciality values depending on which measure we consider. This shows that the measures we have defined in Sect. 2.1 capture different features of the inclusion structure.

While the measures give differing perspectives on the simpliciality of each dataset, we verify that they broadly agree with a correlation analysis. The Pearson correlation coefficient is \(\rho =0.97\) between the SF and ES, \(\rho =0.95\) between the SF and FES, and \(\rho =0.90\) between the ES and FES (all significant at the \(p=0.001\) level). Hence, the values are closely and linearly related in our sample. They also order datasets similarly, from the least to most simplicial, since the Spearman rank-order correlation coefficient is \(\rho =0.89\) between SF and ES, \(\rho =0.997\) between SF and FES, and \(\rho =0.90\) between ES and FES (all significant at the same level).

Although our correlation analysis confirms that these measures roughly capture the same concepts, the datasets where they depart from one another highlight their key differences. In our experiments, these differences are due to features such as large edges with few included edges, many edges that are mostly closed downward, and different edge size distributions. Networks of organizational email messages are examples of the first case, where very large organization- or department-wide emails may be sent with no guarantee that emails are also sent between every possible subgroup of individuals. In this case, we would expect ES to be extremely low while the SF need not be low. Proximity networks are examples of the second case, i.e., dense downward-closed datasets. We see this by noting that the SF is not 1, while both ES and FES are close to 1 due to the SF penalizing almost-simplicial edges. Lastly, the edge size distribution has strong implications on all measures; for the same average edge size and number of edges, increasing both the number of small and large edges will affect the SF and ES measures. For ES, the large edges will exponentially increase the number of subfaces needed to create a simplicial complex, driving the simpliciality to zero. In contrast, increasing the number of small edges can create more small simplices, increasing SF.

3.2 Generative models of higher-order networks

To complement our analysis of empirical data, we also examine the simpliciality of synthetic data generated with generative models for higher-order networks.

We focus on models of hypergraphs designed to describe and analyze arbitrary higher-order structures. There are several random hypergraph models, including, among many classes of models, preferential attachment models [3942], models with community structure [15, 4346], models with specified degree and size sequences [23, 43], Erdös-Rényi models [47, 48], models with latent nodal variables governing edge formation [49], and geometric models [42, 50, 51]. Higher-order random models that are commonly fit to empirical datasets include the configuration model [23], the bipartite Chung-Lu model [23, 43], and the bipartite degree-corrected stochastic block model [52]. See Ref. [7] for an extensive overview. Overwhelmingly, generative hypergraph models lack explicit control over the inclusion structure of hyperedges, so there are often relatively few simplices.

We focus our analysis on three models: the configuration model [23], the bipartite Chung-Lu model [43], and the bipartite degree-corrected stochastic block model (biSBM) [52].

We fit each model to the empirical datasets of Table 1, use the fitted models to generate a distribution of higher-order networks (the posterior predictive distribution in Bayesian parlance), and analyze the resulting distribution of simpliciality values.

In all cases, when sampling synthetic higher-order networks from the three generative models, we generate 103 realizations of each model for each empirical dataset. We use the double edge-swap algorithm presented in Ref. [23] to sample from the configuration model and performed \(10 \times |E|\) edge swaps, roughly in accordance with [53]. For the bipartite Chung-Lu model [43], we extract the degree and edge size sequences and then use a bipartite variation of the algorithm introduced in Ref. [54] and available in XGI [55] to sample from this model. Lastly, when sampling from the biSBM, we used a Markov chain Monte Carlo method with a bipartite prior using the algorithm described in Ref. [52].

All results are reported in Fig. 2. Overwhelmingly, we see that the generative models cannot accurately capture the simpliciality of datasets when they have a non-trivial inclusion structure. While it does not reproduce the correct values, the hypergraph configuration model consistently captures the inclusion structure better than the biSBM and the bipartite Chung-Lu model, irrespective of the simpliciality measure used. This may be due to the exact specification of the degree and edge size sequences; the Chung-Lu model and biSBM only match these sequences in expectation.

Figure 2
figure 2

The simpliciality of empirical datasets and their fitted generative models. The simpliciality of empirical datasets compared with samples from three higher-order generative models: the hypergraph configuration model [23], the bipartite Chung-Lu model [43], and the bipartite degree-corrected stochastic block model [52]. The violin plots indicate the simpliciality of samples drawn from the fitted generative models, and the solid vertical lines indicate the mean of the distributions. Empirical results are shown with a solid circle

3.3 Local measures of simpliciality

As a final demonstration, we apply our local measures of simpliciality to the dataset of emails sent by Enron employees (142 nodes and 1126 hyperedges, filtered to include interactions of sizes 2 and 3). Results are shown in Fig. 3.

Figure 3
figure 3

The local simpliciality of an empirical dataset. The local (A) simplicial fraction, (B) edit simpliciality, and (C) face edit simpliciality of the email-enron dataset filtered to include interactions of sizes 2 and 3. The colors of the histogram bars match the node colors on their corresponding network visualization. Nodes for which the local simpliciality is undefined are colored in grey

Focusing on the histograms first, we find that the SF has the most variability and that the FES covers a similarly large range. In contrast, the ES tells us that nearly every neighborhood is strongly simplicial. This is expected behavior because the ES relies on a simplicial complex induced on the ego-hypergraph; the size of the largest hyperedges in this ego-hypergraph can be substantially smaller than that of the largest hyperedges in the whole hypergraph. As a result, the denominator of Eq. (2) is reduced, increasing the local ES systematically. In contrast, when we take a subset of nodes to form an ego-hypergraph, it is easy to omit a small subface shared by many hyperedges, thus leading to a very small SF (and, similarly, to a small FES).

Turning to the spatial distribution of simpliciality shown in the insets, we see that the SF and FES find a region of high simpliciality at the network’s core with regions of low simpliciality on its edges. In fact, these two measures are largely in agreement, with a Pearson correlation coefficient of \(\rho =0.84\) between the local SF and FES. (The correlation drops to \(\rho =0.69\) when comparing the SF and ES). We also observe several nodes for which simpliciality is undefined. These nodes are only connected via minimal faces in their ego-hypergraphs, and these faces are excluded when calculating both potential and actual simplices.

Finally, inspecting Fig. 3, we notice that, in this case, nodes of similar simpliciality tend to be connected to one another. To quantify this observation, we define the simplicial assortativity as the Pearson correlation coefficient of the simpliciality of pairs of nodes connected by at least one hyperedge. More formally, we use the unweighted adjacency matrix of the hypergraph, A, defined as

$$ A_{ij} = \textstyle\begin{cases} 1, & [\mathbf{BB}^{\mathrm{T}} ]_{ij} > 0\text{ and } i\neq j, \\ 0,& \text{ otherwise,} \end{cases} $$

where B is the incidence matrix of the hypergraph, such that \(B_{ij}=1\) if node edge j is incident on node i. The simplicial assortativity, ρ, can then be defined as

$$ \rho = \sum_{i,j} \frac{A_{ij}(\sigma _{i} -E[\sigma ])(\sigma _{i} -E[\sigma ])}{\text{Var}[\sigma ]}, $$

where \(\sigma _{i}\) is the local simpliciality of node i according to one of our measures. The simplicial assortativity for SF, ES, and FES are denoted \(\rho _{\mathrm{SF}}\), \(\rho _{\mathrm{ES}}\), and \(\rho _{\mathrm{FES}}\) respectively. This coefficient is equivalent to the assortativity coefficient [56] of the local simpliciality on the unweighted pairwise projection of the hypergraph.

One should expect local simpliciality to be assortative as any given subface contributes to the simpliciality of all their nodes. Table 2 shows that the situation is a bit more nuanced. For tags-ask-ubuntu, FES is weakly assortative, whereas the other two measures are weakly disassortative. It is particularly striking that despite the hospital-lyon dataset being highly simplicial (as seen in Fig. 2), it is also weakly disassortative.

Table 2 The simplicial assortativity of each dataset filtered to only include interactions of sizes two and three for computational tractability

4 Conclusion

In this paper, we have introduced measures to summarize the inclusion structure—the simpliciality—of hypergraphs. We have presented three measures of simpliciality but recognize that other definitions of simpliciality may also prove useful. We have discussed how the simpliciality of higher-order datasets depends on many factors, including, but not limited to, the manner in which the dataset was collected, its domain, and the measure of simpliciality. When fitting common generative models to several empirical higher-order networks, we found that the simpliciality of the original dataset is often much higher than the simpliciality of the posterior predictive distribution of fitted models by any measure of simpliciality. Measuring the simplicial assortativity indicates that the simpliciality displays different levels of localization.

We hope this study will serve as a starting point for network scientists aiming to characterize higher-order network datasets and look forward to future work developing these methods along a number of dimensions of interest.

First, we presented global- and node-level definitions of simpliciality, but other scales of interaction may yield further insights into the inclusion structure of the data [27]. Future work could explore mesoscale measures of simpliciality that describe how, for example, simpliciality varies between communities. One could also obtain the largest simplicial component or the set of simplicial components in a hypergraph. In addition, we have restricted ourselves to unweighted simplicial complexes, but one might consider extending these notions to weighted simplicial complexes [57].

Second, our approach complements the existing literature on nestedness in bipartite networks [27], which shows that nestedness exists for a wide variety of unipartite and bipartite networks [58]. Existing work shows that nestedness is important for the function of networks in many domains [28, 5961], and comparing these findings from the perspective of simpliciality could offer additional insights from both a structural and mechanistic perspective.

Finally, we have shown a disparity between the simpliciality of artificial datasets and observed ones. Our findings should thus inform new higher-order network models that specify the inclusion structure of the network and can be fit to empirical higher-order datasets.

Data availability

All datasets are available in the xgi-data repository [31]. The code used to generate all results and figures utilizes the XGI library [55] and is openly available on Github [63].



Simplicial fraction


Edit simpliciality


Face edit simpliciality


bipartite degree-corrected stochastic block model


  1. Mastrandrea R, Fournet J, Barrat A (2015) Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS ONE 10(9):0136497.

    Article  CAS  Google Scholar 

  2. Benson AR, Abebe R, Schaub MT, Jadbabaie A, Kleinberg J (2018) Simplicial closure and higher-order link prediction. Proc Natl Acad Sci 115(48):11221–11230.

    Article  ADS  CAS  Google Scholar 

  3. Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Machine learning: ECML 2004. Lecture notes in computer science. Springer, Berlin, pp 217–226.

    Chapter  Google Scholar 

  4. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):2.

    Article  Google Scholar 

  5. Murgas KA, Saucan E, Sandhu R (2022) Hypergraph geometry reflects higher-order dynamics in protein interaction networks. Sci Rep 12(1):20879.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  6. Patania A, Petri G, Vaccarino F (2017) The shape of collaborations. EPJ Data Sci 6(1):18.

    Article  Google Scholar 

  7. Battiston F, Cencetti G, Iacopini I, Latora V, Lucas M, Patania A, Young J-G, Petri G (2020) Networks beyond pairwise interactions: structure and dynamics. Phys Rep 874:1–92.

    Article  ADS  MathSciNet  Google Scholar 

  8. Torres L, Blevins AS, Bassett D, Eliassi-Rad T (2021) The why, how, and when of representations for complex systems. SIAM Rev 63(3):435–485.

    Article  MathSciNet  Google Scholar 

  9. Eckmann B (1944) Harmonische funktionen und randwertaufgaben in einem komplex. Comment Math Helv 17(1):240–255.

    Article  MathSciNet  Google Scholar 

  10. Bianconi G (2021) Higher-order networks. Elements in the structure and dynamics of complex networks.

    Book  Google Scholar 

  11. Zhang Y, Lucas M, Battiston F (2023) Higher-order interactions shape collective dynamics differently in hypergraphs and simplicial complexes. Nat Commun 14(1):1605.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. Kim J, Lee D-S, Goh K-I (2023) Contagion dynamics on hypergraphs with nested hyperedges. Phys Rev E 108(3):034313.

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  13. LaRock T, Lambiotte R (2023) Encapsulation structure and dynamics in hypergraphs. J Phys Complex 4(4):045007.

    Article  Google Scholar 

  14. Burgio G, Gómez S, Arenas A (2024) Triadic approximation reveals the role of interaction overlap on the spread of complex contagions on higher-order networks. Phys Rev Lett 132(7):077401.

    Article  CAS  PubMed  Google Scholar 

  15. Chodrow PS, Veldt N, Benson AR (2021) Generative hypergraph clustering: from blockmodels to modularity. Sci Adv 7:eabh1303.

    Article  ADS  PubMed  Google Scholar 

  16. Zhou D, Huang J, Schölkopf B (2006) Learning with hypergraphs: clustering, classification, and embedding. Advances in neural information processing systems, vol 19. MIT Press, Cambridge.

    Google Scholar 

  17. Kamiński B, Poulin V, Prałat P, Szufel P, Théberge F (2019) Clustering via hypergraph modularity. PLoS ONE 14(11):0224307.

    Article  CAS  Google Scholar 

  18. Benson AR (2019) Three hypergraph eigenvector centralities. SIAM J Math Data Sci 1(2):293–312.

    Article  MathSciNet  Google Scholar 

  19. Feng S, Heath E, Jefferson B, Joslyn C, Kvinge H, Mitchell HD, Praggastis B, Eisfeld AJ, Sims AC, Thackray LB, Fan S, Walters KB, Halfmann PJ, Westhoff-Smith D, Tan Q, Menachery VD, Sheahan TP, Cockrell AS, Kocher JF, Stratton KG, Heller NC, Bramer LM, Diamond MS, Baric RS, Waters KM, Kawaoka Y, McDermott JE, Purvine E (2021) Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinform 22(1):287.

    Article  Google Scholar 

  20. Tudisco F, Higham DJ (2021) Node and edge nonlinear eigenvector centrality for hypergraphs. Commun Phys 4(1):1–10.

    Article  Google Scholar 

  21. Gallagher SR, Goldberg DS (2013) Clustering coefficients in protein interaction hypernetworks. In: Proceedings of the international conference on bioinformatics, computational biology and biomedical informatics. BCB’13. Association for Computing Machinery, New York, pp 552–560.

    Chapter  Google Scholar 

  22. Klimm F, Deane CM, Reinert G (2021) Hypergraphs for predicting essential genes using multiprotein complex data. J Complex Netw 9:cnaa028.

    Article  MathSciNet  Google Scholar 

  23. Chodrow PS (2020) Configuration models of random hypergraphs. J Complex Netw 8:cnaa018.

    Article  MathSciNet  Google Scholar 

  24. Landry NW, Restrepo JG (2022) Hypergraph assortativity: a dynamical systems perspective. Chaos, Interdiscip J Nonlinear Sci 32(5):053113.

    Article  MathSciNet  Google Scholar 

  25. Landry NW, Restrepo JG (2020) The effect of heterogeneity on hypergraph contagion models. Chaos, Interdiscip J Nonlinear Sci 30(10):103117.

    Article  MathSciNet  CAS  Google Scholar 

  26. Joslyn CA, Aksoy SG, Callahan TJ, Hunter LE, Jefferson B, Praggastis B, Purvine E, Tripodi IJ (2021) Hypernetwork science: from multidimensional networks to computational topology. In: Braha D, de Aguiar MAM, Gershenson C, Morales AJ, Kaufman L, Naumova EN, Minai AA, Bar-Yam Y (eds) Unifying themes in complex systems X. Springer proceedings in complexity. Springer, Cham, pp 377–392.

    Chapter  Google Scholar 

  27. Mariani MS, Ren Z-M, Bascompte J, Tessone CJ (2019) Nestedness in complex networks: observation, emergence, and implications. Phys Rep 813:1–90.

    Article  ADS  MathSciNet  Google Scholar 

  28. Bastolla U, Fortuna MA, Pascual-García A, Ferrera A, Luque B, Bascompte J (2009) The architecture of mutualistic networks minimizes competition and increases biodiversity. Nature 458(7241):1018–1020.

    Article  ADS  CAS  PubMed  Google Scholar 

  29. Hatcher A (2001) Algebraic topology, 1st edn. Cambridge University Press, Cambridge

    Google Scholar 

  30. Landry NW, Amburg I, Shi M, Aksoy SG (2024) Filtering higher-order datasets. J Phys Complex 5(1):015006.

    Article  PubMed Central  Google Scholar 

  31. Landry N, Torres L, Lucas M, Iacopini I, Petri G, Patania A, Schwarze A (2023) XGI-DATA.

  32. Stehlé J, Voirin N, Barrat A, Cattuto C, Isella L, Pinton J-F, Quaggiotto M, den Broeck WV, Régis C, Lina B, Vanhems P (2011) High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6(8):23176.

    Article  ADS  CAS  Google Scholar 

  33. Vanhems P, Barrat A, Cattuto C, Pinton J-F, Khanafer N, Régis C, Kim B-A, Comte B (2013) Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PLoS ONE 8(9):73970.

    Article  ADS  CAS  Google Scholar 

  34. Yin H, Benson AR, Leskovec J, Gleich DF (2017) Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’17. Association for Computing Machinery, New York, pp 555–564.

    Chapter  Google Scholar 

  35. Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007) The human disease network. Proc Natl Acad Sci 104(21):8685–8690.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  36. Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI (2020) The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res 48(D1):845–855.

    Article  CAS  Google Scholar 

  37. Fowler JH (2006) Connecting the congress: a study of cosponsorship networks. Polit Anal 14(4):456–487.

    Article  Google Scholar 

  38. Fowler JH (2006) Legislative cosponsorship networks in the US House and Senate. Soc Netw 28(4):454–465.

    Article  Google Scholar 

  39. Wang J-W, Rong L-L, Deng Q-H, Zhang J-Y (2010) Evolving hypernetwork model. Eur Phys J B 77(4):493–498.

    Article  ADS  CAS  Google Scholar 

  40. Avin C, Lotker Z, Nahum Y, Peleg D (2019) Random preferential attachment hypergraph. In: 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 398–405.

    Chapter  Google Scholar 

  41. Do MT, Yoon S-E, Hooi B (2020) Structural patterns and generative models of real-world hypergraphs. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. KDD ’20. Association for Computing Machinery, New York, pp 176–186.

    Chapter  Google Scholar 

  42. Barthelemy M (2022) Class of models for random hypergraphs. Phys Rev E 106(6):064310.

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  43. Aksoy SG, Kolda TG, Pinar A (2017) Measuring and modeling bipartite graphs with community structure. J Complex Netw 5(4):581–603.

    Article  MathSciNet  Google Scholar 

  44. Zhang Q, Tan VYF (2023) Exact recovery in the general hypergraph stochastic block model. IEEE Trans Inf Theory 69(1):453–471.

    Article  MathSciNet  Google Scholar 

  45. Kim C, Bandeira AS, Goemans MX (2018) Stochastic block model for hypergraphs: statistical limits and a semidefinite programming approach. arXiv:1807.02884

  46. Ruggeri N, Contisciani M, Battiston F, De Bacco C (2023) Community detection in large hypergraphs. Sci Adv 9(28):9159.

    Article  Google Scholar 

  47. Dewar M, Healy J, Pérez-Giménez X, Prałat P, Proos J, Reiniger B, Ternovsky K (2018) Subhypergraphs in non-uniform random hypergraphs. arXiv:1703.07686

  48. Iacopini I, Petri G, Barrat A, Latora V (2019) Simplicial models of social contagion. Nat Commun 10(1):2485.

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  49. Di Gaetano L, Battiston F, Starnini M (2024) Percolation and topological properties of temporal higher-order networks. Phys Rev Lett 132(3):037401.

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  50. Turnbull K, Lunagómez S, Nemeth C, Airoldi E (2023) Latent space modeling of hypergraph data. J Am Stat Assoc.

    Article  Google Scholar 

  51. Lunagómez S, Mukherjee S, Wolpert RL, Airoldi EM (2017) Geometric representations of random hypergraphs. J Am Stat Assoc 112(517):363–383.

    Article  MathSciNet  CAS  Google Scholar 

  52. Yen T-C, Larremore DB (2020) Community detection in bipartite networks with stochastic block models. Phys Rev E 102(3):032309.

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  53. Dutta U, Fosdick BK, Clauset A (2023) Sampling random graphs with specified degree sequences. arXiv:2105.12120

  54. Miller JC, Hagberg A (2011) Efficient generation of networks with given expected degrees. In: Frieze A, Horn P, Prałat P (eds) Algorithms and models for the web graph. Lecture notes in computer science. Springer, Berlin, pp 115–126.

    Chapter  Google Scholar 

  55. Landry NW, Lucas M, Iacopini I, Petri G, Schwarze A, Patania A, Torres L (2023) XGI: a Python package for higher-order interaction networks. J Open Sour Softw 8(85):5162.

    Article  Google Scholar 

  56. Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67(2):026126.

    Article  ADS  MathSciNet  CAS  Google Scholar 

  57. Baccini F, Geraci F, Bianconi G (2022) Weighted simplicial complexes and their representation power of higher-order network data and topology. Phys Rev E 106(3):034319.

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  58. Johnson S, Domínguez-García V, Muñoz MA (2013) Factors determining nestedness in complex networks. PLoS ONE 8(9):74025.

    Article  ADS  CAS  Google Scholar 

  59. Saavedra S, Stouffer DB, Uzzi B, Bascompte J (2011) Strong contributors to network persistence are the most vulnerable to extinction. Nature 478(7368):233–235.

    Article  ADS  CAS  PubMed  Google Scholar 

  60. Kamilar JM, Atkinson QD (2014) Cultural assemblages show nested structure in humans and chimpanzees but not orangutans. Proc Natl Acad Sci 111(1):111–115.

    Article  ADS  CAS  PubMed  Google Scholar 

  61. Cantor M, Pires MM, Marquitti FMD, Raimundo RLG, Sebastián-González E, Coltri PP, Perez SI, Barneche DR, Brandt DYC, Nunes K, Daura-Jorge FG, Floeter SR, Guimarães Jr PR (2017) Nestedness across biological scales. PLoS ONE 12(2):0171691.

    Article  CAS  Google Scholar 

  62. Fredkin E (1960) Trie memory. Commun ACM 3(9):490–499.

    Article  Google Scholar 

  63. Landry N (2024) nwlandry/the-simpliciality-of-higher-order-networks: v0.1. Zenodo.

Download references


N.W.L. would like to acknowledge the participants of the “Workshop on Modelling and Mining Complex Networks as Hypergraphs” at Toronto Metropolitan University and Tim LaRock for helpful conversations. N.W.L. would also like to thank Tzu-Chi Yen for lending his expertise on the biSBM inference.


N.W.L. and J.-G.Y. acknowledge support from the National Institutes of Health 1P20 GM125498-01 Centers of Biomedical Research Excellence Award. N.E. acknowledges the Harris Faculty Fellowship from Grinnell College.

Author information

Authors and Affiliations



NWL conceived the project; NWL, JGY, and NE designed the research; NWL and NE performed the research; NWL, JGY, and NE wrote the article; and NWL, JGY, and NE edited the draft. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Nicholas W. Landry.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix A: Convergence of the configuration model MCMC

To sample from the configuration model, we implemented the Markov chain Monte Carlo algorithm proposed in Ref. [23]. At each step, (1) two edges are selected at random, \(e_{1}\) and \(e_{2}\); (2) two nodes are selected at random, \(i \in e_{1}\) and \(j \in e_{2}\); and (3) the memberships of these nodes are swapped to form two new edges, \(\widetilde{e}_{1} = (e_{1} \setminus i) \cup j\) and \(\widetilde{e}_{2} = (e_{2} \setminus j) \cup i\). This operation is accepted if the move does not create loopy hyperedges, i.e., \(|e_{1}| = |\widetilde{e}_{1}|\) and \(|e_{2}| = |\widetilde{e}_{2}|\). In Ref. [23], the criterion for convergence is left for future work, and Ref. [53] presents criteria for the convergence of the configuration model for pairwise networks. In Fig. 4, we show that the number of double-edge swaps chosen for our configuration model algorithm (\(10\times |E|\)) is sufficient for convergence with respect to all measures of simpliciality.

Figure 4
figure 4

Convergence of the simpliciality for representative datasets from each data domain. The rightmost markers in all plots correspond with the number of edge swaps used when generating the results in Fig. 2

Algorithm 1
figure a

Simplicial fraction

Algorithm 2
figure b

Exhaustive edit simpliciality

From Fig. 4, we see that the configuration model sampler has roughly achieved the stationary value of simpliciality after roughly 10% of the edge swaps have been completed. Assessing convergence in a statistically robust manner is necessary to ensure uniform sampling from the hypergraph configuration model, but this heuristic is sufficient for our purposes.

Appendix B: Measuring simpliciality efficiently

For all measures, we leverage the trie data structure [62] to efficiently compute the measures described in this paper. The trie structure allows us to efficiently verify whether a subface exists (\(\mathcal{O}(|e|)\), for an edge e). To be compatible with the trie data structure, when adding an edge to the trie and when searching for an edge in the trie, we first sort the nodes in that edge. Below, we present the algorithms employed in generating all results.

Simplicial fraction

For each edge, if a single subface is absent, we can immediately determine that the edge is not a simplex. This can be very efficient for sparse hypergraphs.

Edit simpliciality

There are two ways to compute the edit simpliciality: the exhaustive method and a more memory-efficient version. The exhaustive method is simpler and more computationally efficient. However, the memory requirements are enormous because it stores every missing subface. The memory-efficient version keeps track of the number of missing subfaces, not the missing subfaces themselves. This leverages the fact that the number of hyperedges intersecting with any given hyperedge is typically much smaller than the number of hyperedges. We store the hypergraph of maximal edges for fast retrieval of the neighbors of a given maximal face, which is needed in Algorithm 3.

Algorithm 3
figure c

Memory-efficient edit simpliciality

Algorithm 4
figure d

Face edit simpliciality

Face edit simpliciality

This computation is more straightforward than the ES computation because it is a local measure that requires relatively little memory and allows us to compute the number of missing subfaces and then store the FES as a running average.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Landry, N.W., Young, JG. & Eikmeier, N. The simpliciality of higher-order networks. EPJ Data Sci. 13, 17 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: