Extracting Complements and Substitutes from Sales Data: A Network Perspective

The complementarity and substitutability between products are essential concepts in retail and marketing. Qualitatively, two products are said to be substitutable if a customer can replace one product by the other, while they are complementary if they tend to be bought together. In this article, we take a network perspective to help automatically identify complements and substitutes from sales transaction data. Starting from a bipartite product-purchase network representation, with both transaction nodes and product nodes, we develop appropriate null models to infer significant relations, either complements or substitutes, between products, and design measures based on random walks to quantify their importance. The resulting unipartite networks between products are then analysed with community detection methods, in order to find groups of similar products for the different types of relationships. The results are validated by combining observations from a real-world basket dataset with the existing product hierarchy, as well as a large-scale flavour compound and recipe dataset.


Introduction
Understanding the hidden relations existing between products is fundamental in both economics and marketing research as well as in retail [1]. This question lies at the core of market structure analysis and finds numerous applications. Retailers must regularly make decisions taking product relationships into account [2], for instance to design their product catalogue and to determine the number of products to offer in each category [3]. Brick-and-mortar retailers seek to identify the best way to arrange the product layout in aisles and stock their shelves [4], and online retailers also strive to optimise the grouping of products in their online shops [5]. Furthermore, they must decide which products to bundle or promote together. These assortment-related decisions have significant influence on customers' choices, sales of products, and finally, profits [2,3,6].
Complements and substitutes are two central concepts to characterise relationships between products, with well-established definitions in economics [7]. Complementary products are sold separately but used together, each creating a demand for the other, such as hot dogs and hot dog buns. Substitute products serve the same purpose and can be used in place of one another, such as Brand A tomatoes and Brand B tomatoes. In the economics literature, the degree of complementarity (substitutability) is formally defined through the negative (positive) cross-price elasticity of demand, where the rise in the demand of one product is recorded after the price of the 1 other product is reduced (increased) by a unit. The mechanisms of complements and substitutes are also referred to as the halo effect and demand transfer, respectively, in the retail context [8].
Despite its practical importance, the algorithmic problem of identifying the relationship between products in retail is not well known. For a long time, researchers and practitioners have selected the set of possible complementary or substitute products by means of, for instance, field expertise and simple statistics, and the analysis has usually been restricted to a fairly small number of products [9,10]. Recent development and application of natural language processing and machine learning (ML) algorithms, especially those based on word embedding, bring in new visions and opportunities, which makes it possible to analyse thousands of products [11][12][13]. These methods use the transactions, some require customer information, as the original feature space, and apply ML algorithms to essentially reduce the dimension of these feature vectors. The resulting reduced dimensional embeddings can then be used to identify the relationship between products and also in customer choice models.
However, there are several limitations in these applications. Firstly, the interpretation of the selected features in the related ML algorithms is difficult. This makes it challenging to develop metrics in this space, in particular to verify the property of triangle inequality, and further use metric invariants to define measures between products. In practice, these methods often rely on the definition of similarity measures (not necessarily metrics) for complementarity and exchangeability 1 [12,13]. Secondly, these methods lack specific criteria to determine whether two products are complements, substitutes, or just independent, despite trying to quantify the effects by their similarities. Thirdly, they do not explore the connection between complements and substitutes, which is of great significance to improve the understanding of the relationships and their further applications. Furthermore, these methods are based on co-purchase patterns, but the other valuable information in the sales data remains unused.
In our study, we propose an alternative to the classic approach based on cross-price elasticity, and take instead a network perspective in order to define complementarity and substitutability between products. As a starting point, we model the sales data as a bipartite product-purchase network, with both transactions (or baskets) and products as nodes. We then perform our analysis directly on the network, without having to rely on low-dimensional embeddings which may lead to uncontrolled loss of information. We use the connectivity patterns between products to characterise complements and substitutes. To do so, we define null models on the bipartite network to determine significant relationships between products, and propose measures induced by random walks on networks to quantify the intensity of these relationships. This approach can be seen as a generalisation of the classic bipartite network projection where we focus on different notions of connectivity induced by the bipartite structure. We also take an initial step to explicitly incorporate noise effects in our measures. As we show later, the resulting projections onto unipartite networks, based both 1 Exchangeability is an additional concept in order to verify whether two products are substitutes, and researchers postulate that they are if they have low complementarity and high exchangeability.
on complementary and substitute connections, allow us to find groups of similar products with standard tools like community detection.
The aim of our work is to provide insights into product relationships from a network perspective with simple assumptions, and to further extract both complements and substitutes efficiently from easily accessible sales data. It is our belief that a network approach opens up a promising new angle on this problem due to its flexibility, e.g. in determining significant relationships, and the vast network science toolbox. The insights derived from our methods have applications in assortment-related decision making, not only for retailers but also general firms with long product lines. Furthermore, our set of methods can also be applied to other contexts, such as trading networks, ecological systems and social networks, where both the identification of cooperative and competitive relations are of interest.

Data
In this section, we present the different datasets that we use to perform our analysis, the sales data in Sect. 2.1 to extract the product relationships, the product hierarchy data in Sect. 2.2 and the flavour compound and recipe data in Sect. 2.3 to validate the results.
2.1. Sales data. We used anonymised grocery sales data from Tesco, the UK's largest supermarket chain. The data consists of timestamped transactions of stores, and it has been anonymised for general research purposes, i.e. each customer's personal identifiable information has been removed. For each store, the transaction data comprises a transaction ID, which gives a unique code to each shopping trip, the date when the transaction was made, the product IDs, and their purchased quantities; see the top of Fig. 1.
The data used for this study is from a generic convenience store in an urban area, and spans a three-month period avoiding major holidays such as Christmas and Easter. The time window is chosen to be long enough to be representative of the underlying customer population's product purchase patterns, but also sufficiently short to avoid seasonal effects as well as change of behaviour over time. Furthermore, to facilitate the interpretation of the results, we restrict our analysis to fresh fruit, vegetables and salads where we believe complementary and substitute products commonly exist. We also exclude products that are purchased less than once a month, and those in almost every transaction. These result in the final dataset of 43837 transactions and 253 products.

2.2.
Product hierarchy data. In retail, it is common to organise products in a hierarchy, where similar products are grouped into increasingly generic categories. Products that are close together in the hierarchy are typically sold next to each other in a store. At the lowest hierarchical level, each unique code corresponds to a different product, including the same products of different sizes or flavours. Overall, we have 4 levels, from L1 to L4 (excluding the product level). The higher a level is, the more generic the corresponding category. For example, "apple" is a category in the L1 hierarchy, and "fruit" is a category in the L3 hierarchy. Hence, a natural way to validate our product relationships and to explore their features would be to compare them to the corresponding product hierarchy.
2.3. Flavour compound and recipe data. Ahn et al [14] provide a systematic list of 1107 flavour compounds and their natural occurrences in terms of 1525 ingredients overall from Fenaroli's handbook of flavour ingredients [15]. They also provide 56498 recipes belonging to geographically distinct cuisines (North American, Western European, Southern European, Latin American and East Asian), which were obtained from epicurious.com, allrecipes.com and menupan.com; see the bottom of Fig. 1. Hence, to validate our results from the features in both flavour compounds and recipes, we match our products to their ingredients.
To construct the correspondence between our products and the flavour compounds, we match each product to as many ingredients as possible. For example, "Loose Peppers" is matched to all possibly equivalent peppers including "bell pepper" and "green bell pepper"; see the middle left of Fig. 1. This results in our ingredients of interest to be 140, with their corresponding flavour compounds being 865, and each ingredient is linked to 57 flavour compounds on average. Note that there are 11 products which do not have exactly matched ingredients, hence we match them to generic ones 2 . For example, we match the product "Single Pomegranate" to the ingredient "fruit". There are also 44 complex products whose ingredients cannot be directly inferred from their names, thus we match them to their main ingredients on the website. For example, we match the product "Cheddar Coleslaw" to the ingredients "cheddar cheese", "cabbage", "carrot" and "onion".
For the recipe data, we match each product to as few and simple ingredients as possible. For example, "Loose Peppers" is now only matched to "bell pepper"; see the middle right of Fig. 1. We then restrict to products only corresponding to one ingredient, and also remove products that are matched to unrepresentative generic ingredients (e.g. "vegetable"). We take a (generic) ingredient to be unrepresentative, if it shares less than half of its flavour compounds with the ingredients in the same category. As an example, if the product "Loose Aubergine" were only matched to the generic ingredient "vegetable" which shared less than half of its flavour compounds with all other vegetable ingredients, such as "asparagus", "lettuce" and "onion", we would exclude this product. This further reduces the number of products and ingredients of interest to be 175 and 69 respectively, with 47222 corresponding recipes, and each recipe being expected to contain 3 such ingredients.

3.1.
Product-purchase network. We model the structure in the sales transaction data as a bipartite network, where we have two subsets of nodes, one corresponding to transactions and the other to products. A transaction node and a product node are connected, if the product is purchased in that particular transaction; see Figs. 1 (top) and 2. We call it the product-purchase network, and aim to extract product relationships from how product nodes are connected to each  Figure 2. Example of a product-purchase network, where blue squares are transaction nodes, red circles are products nodes and these two sets of nodes are connected if the product is purchased in the corresponding transaction. The underlying sales data contains both complements (e.g. hot dog3 and hot dog bun1) and substitutes (e.g. taco seasoning1 and taco seasoning2).
other in the network. This problem is generally related to the projection of bipartite networks to unipartite ones [16]. Different strategies exist depending on the nature of the relationship that one wants to infer [16][17][18][19][20][21]. While a majority of works look for assortative relations, in the sense that two nodes are connected in the unipartite network if they tend to share many neighbours in the bipartite one, more general types of projections can be defined, which are associated to the role played by the nodes in the bipartite network, and are particularly relevant to extracting complements and substitutes. In the following section, we will specify our assumptions about the product relationships, which can be further interpreted as the specific connectivity patterns in the product-purchase network; see typical examples in Fig. 2.
We use the biadjacency matrix A (b) = (A li ) ∈ {0, 1} nt×np to represent the product-purchase network, where n t is the number of transaction nodes, n p is the number of product nodes, and A li = 1 if product i is purchased in transaction l and 0 otherwise.

Key assumptions.
To characterise the product relationships, we consider the purchase patterns of products. Specifically, in the context when prices change frequently, complements can be identified through sufficient co-purchases [22], while substitutes have almost no co-purchases. The feature of substitutes that have similar interactions with other products is commonly used in practice [12,13], and combined with the almost-no-co-purchase characteristics, it can be used to determine the substitute relationship. Note that the formal definition through cross-price elasticity is expected to emerge from such purchase patterns, where, for example, two products always purchased together implies that the decrease in one's price will result in an increase of the other's demand 3 . Based on these arguments, we propose the following assumptions 1-4 to characterise complements and substitutes in the product-purchase network.
1 Complements are products that are in the same transactions significantly more frequently than expected. 2 The degree of complementarity between complements is positively correlated with how frequently they are in the same transactions. 3 Substitutes are products that share the same complements but are in the same transactions significantly less frequently. 4 The degree of substitutability between substitutes is positively correlated with how similar their complements are.
In addition, we define noise to be the purchase patterns that are caused by other, often unknown, factors and cannot be explained by complementarity and substitutability. Thus to capture the product relationships and their degrees, it is essential to control the noise effect. In networks, local structure usually refers to the information around a node, and global structure characterises the whole network. For intermediate scales, one often refers to the notion of mesoscale structure, which is associated to groups of nodes that share similar connectivity patterns. Here, we consider particularly the community structure, where groups of nodes are densely connected internally but sparsely connected externally. Within our context, we exploit the fact that the mesoscale structure is much more robust to noise than the local information [23] Hence, we further propose the following assumptions 5 and 6 to restrict the noise effect.
5 Noise will not change the community structure of complements and substitutes, i.e. groups of products that are mostly complements and substitutes respectively. 6 Noise can be explained by some random models so that its effect on the local structure of the network can be removed accordingly.
Hence, in Sect. 3.3, we will determine whether each pair of products are complements or substitutes by applying significance tests on the number of common neighbours between each pair of nodes in the product-purchase network (i.e. the same transactions they are in, assumptions 1 and 3). This step corresponds to projecting the bipartite network on the product side to form two unweighted unipartite networks, showing the existence of the two relationships. Further, in Sect. 3.4, we will quantify the degrees of complementarity and substitutability by local measures based on the product nodes' neighbourhood structure in the bipartite and projected networks, respectively (assumptions 2 and 4). This step further adds weights to the corresponding unipartite networks.
3.3. Null models. We propose the following null models on the product-purchase network, to determine whether the number of common neighbours, cn ij , between each pair of product nodes i and j, is significantly more or significantly less, with significance levels α m or α l , respectively. 3 The demand of a product is generally a decreasing function of its own price. This statement is true for the products analysed here. Accordingly, two unweighted unipartite networks only consisting of product nodes can be obtained: The ER model assumes a fixed probability for each edge to appear, independently of the others [24], while bipartite ER models only allow edges between the two subsets of nodes.
In our variant, we assign a different connecting probability p i for each product i. Then, the probability that a transaction node is connected with both product nodes i and j is p i p j ; the number of their common neighbours, cn ij , is a random variable X ij , s.t. X ij ∼ B(n t , p i p j ). We further assume n t is sufficiently large, and approximate the distribution by , from the Central Limit Theorem [25]. Hence, cn ij is significantly more if and is significantly less if where Φ −1 (·) is the inverse cumulative function of N (0, 1), and the maximum likelihood estimate for each p i isp is the degree of product node i.

3.3.2.
Bipartite Configuration Models (BiCMs). The configuration model creates a network with a given degree sequence {d i }, by assigning d i half-edges (or stubs) to each node i and joining two chosen stubs uniformly at random until no more stubs are left [26,27]. The BiCM takes the bipartite features into account, where two degree sequences are given, dividing the nodes into two subsets, and edges are only allowed between the two subsets of nodes. Note that multi-edges are allowed here, but since we assume finite variance in both degree distributions, they are negligible in large networks (see Appendix A.1 for details).
The probability of product nodes i, j sharing a transaction node l is where the superscripts t, p stand for transaction nodes and product nodes respectively, d h is the degree of node h, and m = nt l=1 d is the number of edges (see Appendix A.1 for details). The variant of bipartite ER models in Sect. 3.3.1 can be seen as an approximation of this model, where we assume that the degree of each transaction node is constant.
The number of common neighbours between product nodes i and j, cn ij , is the sum of Bernoulli(p ilj ) over l, where p ilj possibly varies for different transaction node l. We assume independence between different transaction nodes to connect with them both. Hence, cn ij is a Poisson binomial random variable, X ij , with the mean value l )/n t . The Poisson binomial distribution can be well approximated, with an exact error bound, by a Poisson distribution with the same mean, if the composing Bernoulli probabilities, p ilj , are sufficiently small [28]. Since real networks are sparse, the p ilj s are generally small (see our particular case in Appendix B.1). Hence, we use Y ij ∼ P oisson(µ ij ) for the significance tests here, and determine cn ij to be significantly more if and to be significantly less if where F ij (y) = e −µ ij y k=0 µ k ij /k! is the cumulative distribution function of Y ij . The two null models are proposed to explain the purchase patterns purely from noise; with more information about the noise factors, one can propose more customised null models to explain more of such patterns. Currently, our null models are only based on difference in product popularity, and the BiCM also uses the heterogeneity in basket sizes: both are sufficiently general to incorporate additional noise factors, but could possibly not be sufficient in their current form, as hidden factors, e.g. correlated preference, could cause more common neighbours between product nodes in the product-purchase networks. Hence, by assumption 5 in Sect. 3.2, we accompany these null models with extra rules of significance-level selection: (i) α m is chosen to be the smallest value that maintains the same community structure as that obtained from a baseline significance level, to exclude the above spurious signal; (ii) α l is chosen to be the largest such value, in order not to accidentally filter out genuine patterns.
Finally, we can obtain the unweighted network of complementary relationship, A (c) , and that of substitute relationship, A (s) . By assumption 1, by assumption 3, where I {·} is the element-wise indicator matrix, and represents element-wise (Hadamard) matrix product.

3.4.
Measures. The degrees of complementarity and substitutability matter. A significant relationship is not necessarily a strong relationship, and stronger relationships should be given higher weights to be more dominant in the networks. By assumption 2 in Sect. 3.2, the degree of complementarity is not directly correlated with how significant the co-purchase pattern is, but its relative frequency; by assumption 4 in Sect. 3.2, neither is the degree of substitutability, which causes the results in Sect. 3.3 not to be applicable here. Hence, in this section, we further propose measures to quantify both degrees, in order to convert the unweighted unipartite networks, A (c) and A (s) , to weighted ones, W (c) and 3.4.1. Measures for complementarity. We propose several measures for the degree of complementarity by interpreting assumption 2, where the more similar their neighbours in the product-purchase network are, the more complementary they are.
We start from an enhanced version of assumption 6 in Sect. 3.2: the noise factors change frequently and erratically so that their bias on the relative number of co-purchases between pairs of products can be neglected. We then propose the following measures, derived from the weighted cosine similarity between random walkers starting from pairs of nodes after one step. Specifically, for each product node i, suppose that an impulse y i (0) = e i ∈ {0, 1} np , with value 1 only in its i-th element, is injected on the product side at time t = 0. We record the response of the system after a one-step random walk is the biadjacency matrix from the transaction nodes to the product nodes, and D (p) = Diag(d (p) i ) is the diagonal matrix with the degrees of product nodes on its diagonal [29]. We set the relative importance of each transaction l as the inverse of its degree d (t) l , hence the weighted cosine similarity between the responses y i (1) and y j (1) is l ) is the weight matrix for the cosine similarity, and ||y|| W = y T Wy = ||W 1/2 y|| 2 with W (symmetric) positive-definite. This introduces the first measures we propose, the original measure, where l } nt l=1 and n t are the same as before. Hence, each common neighbour A li A lj between each pair of product nodes i, j in the product-purchase network is discounted by the degree of the corresponding transaction node l, and this quantity is further scaled so that each product is at the maximum level of complementarity to itself, i.e value 1, in a symmetric manner. A higher value means relatively more common neighbours of lower degrees. Naturally, we also propose the original directed measure, where each (i, j) entry measures the degree of complementarity of product i to product j. Compared with those in the literature, our measures are globally comparable, where node pairs with no common node can also be compared.
The above enhanced version of assumption 6 is reasonable for our choice of fresh food, since the price has been changed frequently and erratically, as required, during the chosen time period. For a general product, in contrast, it would be necessary to implement assumption 6, in order to properly remove the noise effect from our measures. However, most literature followed the direction of filtering out insignificant edges, rather than removing noise from network measures. In this article, we take an initial step in the latter direction by deducting the mean value with some noise models.
First, we should determine which quantity to subtract the mean from. If we consider the original measure as the geometric mean, where Γ(i) = {l : A li = 1} is the set of node i's neighbours in the product-purchase network, then we can propose the corresponding randomised measure, where li ) is the corresponding biadjacency matrix of a random productpurchase network with each A (r) li being a random variable, and d li . The randomised directed measure naturally follows to be Next, we should determine the noise model. For example, assuming fixed basket sizes (transaction node degrees) and product purchase frequencies (product node degrees), naturally leads us to the BiCM (cf. Sect. 3.3) as our noise model. In this particular case, where d (p) i is the degree of product node i, and m is the number of edges in the product-purchase network. With Equations (3), (4) and (5), we accordingly introduce the randomised configuration measure and the randomised configuration directed measure.
Since the measures can be computed for any product pairs, but by assumption 2, only product pairs with value 1 in A (c) can be assigned positive degrees. Hence, the weighted adjacency matrix of complement unipartite network is obtained by where the subscript † can be o, r, od or rd, and sim † = (sim † (i, j)) ∈ [0, 1] np×np . We call the values in W (c) the complementarity scores, and determine a pair of products to be complements if they have a positive complementarity score.

3.4.2.
Measures for substitutability. We propose measures for the degree of substitutability by assumption 4, where the more similar their complements are, the more substitutable they are. Here, we characterise each product by a vector of its complementarity scores with the other products, and use the (unweighted) cosine similarity between these vectors to indicate the degree of substitutability between pairs of products. Specifically, for a pair of nodes i, j, where ij ) is the weighted adjacency matrix of complement unipartite network, and n p is the number of products. The substitutability measures are named after the complementarity measure used in W (c) . For example, with the original measure, we have the original substitutability measure; with the randomised configuration measure, we have the randomised configuration substitutability measure. Naturally, we also propose the directed version, where for a pair of nodes i, j, where the minimum function is used to guarantee that the measure reaches its maximum value when the complementarity degrees of product i to others are no less than the corresponding degrees of product j.
Since these measures can also be computed for any product pairs, but by assumption 4, only product pairs of value 1 in A (s) can be assigned positive degrees. Hence the weighted adjacency matrix of substitute unipartite network is obtained by where the subscript † stands for s or sd, and sim † = (sim † (i, j)) ∈ [0, 1] np×np . We name the values in W (s) the substitutability scores, and define a pair of products to be substitutes if they have a positive substitutability score.
Note the measures of substitutability are based on those of complementarity and we do not apply extra noise removing strategies here, thus it is critical that the complementarity degree is thresholded appropriately so that the substitutability degree is not biased by low-complementaritydegree products. Hence, by assumption 5 in Sect. 3.2, we accompany these measures with the following rules of threshold selection in analysing real data: (i) the threshold of the complementarity measures, θ c , is chosen to be the largest value that maintain the same community structure as that obtained from a baseline threshold value; (ii) the threshold of the substitutability measures, θ s , is chosen to be the smallest such value, for general noise removing purpose.
3.5. Role extraction. Since both the null models in Sect. 3.3 and the measures in Sect. 3.4 are based on local patterns in the product-purchase network directly or indirectly, so are the complement unipartite network, W (c) , and the substitute unipartite network, W (s) . It is then interesting to go beyond local patterns and explore the features between the node level and the whole network, the mesoscale structure, in such networks, i.e. groups of complements and groups of substitutes.
One important type of mesoscale feature is the community structure, as in Sect. 3.2, where communities are groups of nodes that are densely connected internally but sparsely connected externally [30][31][32]. Various algorithms exist by virtue of interdisciplinary expertise [33][34][35][36][37], generally aiming to optimise a quality function with respect to different partitions of the network. Here, we choose the information-theoretic (hierarchical) map equation [36], which aims to describe the trajectory of random walkers on the network most efficiently, thereby capturing the right community structure of the underlying network, and is known for being not affected by a common problem of community detection algorithms, the resolution limit [38]. From the detected structure, we will also examine the underlying assumption that groups are clique-like.
Considering the problem of extracting these two kinds of product groups in the bipartite productpurchase network, it corresponds to a more general problem, role extraction. Roles are general versions of communities, where nodes inside the same role share similar connectivity patterns across the network [39][40][41]. Hence, it contains both classic assortative communities, as described before, and disassortative communities, where nodes are loosely connected internally while densely connected externally. We define the role adjacency as B = (B rs ), where C r , r = 1, 2, . . . , are the roles, n (r) = |C r | for each role r, and W = (W ij ) is the (weighted) adjacency matrix of the underlying network. The matrix B is induced by the maximum-likelihood estimate of the expected weights between nodes inside the corresponding role(s) in the standard stochastic block model [42]. Then, C r is an assortative community if B rs B rr , ∀s = r; C r is a disassortative community if B rs B rr , ∃s, i.e. community r is much more densely connected with at least one other community s than itself. Thus, our set of methods establishes an indirect solution to the role extraction in bipartite networks. We call our detected groups of complements, the complement roles, and our detected groups of substitutes, the substitute roles.
3.6. External validation: product hierarchy, flavour compound and recipe data. We start from the product hierarchy information to characterise both complement roles and substitute roles, and then check if the characteristics are consistent with the common understanding of complements and substitutes. Specifically, we exclusively use the L3 product hierarchy, consisting of fruit (F), organic produce (OP), prepared produce (PP), salad (S), and vegetable (V).
Next, we use the correspondence between flavour compounds and products to compute the Jaccard index, i.e. the relative number of shared flavour compounds, rf ij , between each pair of products i and j, where C(i) is the set of all flavour compounds in product i. We then consider the cases in which rf ij = 0 and rf ij = 1, and check if the complementary pairs have a higher probability to share no flavour compounds and if the substitute pairs have a higher probability to share all their flavour compounds. Furthermore, we examine the relationship between rf ij and W ij , in terms of the Pearson correlation, as well as the Spearman correlation.
Subsequently, we use the recipe data to evaluate the relative number of shared recipes, rr ij , between each pair of products i and j, where R(i) is the set of all recipes including product i, and we set rr ij = 0 if products i and j are matched to the same ingredient. We then assess if the complementary pairs and substitute pairs have significantly higher and lower probabilities to co-appear in relatively more recipes, respectively. This is achieved by the Mann-Whitney-Wilcoxon (MWW) tests, where and X, Y are two independent random variables [43,44]. For example, let X be the relative number of shared recipes from all product pairs {rr ij }, and Y be that from only complementary pairs {rr ij : W (c) ij > 0}. Then, we will use the alternative hypothesis H 1 : P (X > Y ) < P (Y > X). Similarly, we also explore the relationship between rr ij and W ij . Finally, we apply our overall framework to the recipe data, where we treat recipes as transactions and ingredients as products. This stems from the hypothesis that customers purchase products to cook dishes following recipes, and thus the recipe data should be a restriction of the sales data. We compare the values of complementarity scores by recipes, the recipe complementarity scores W (cr) = (W (cr) ij ), with those by sales, W (c) , and similarly, the recipe substitutability scores W (sr) = (W (sr) ij ), with W (s) . Note that we set W (cr) ij = 0 and W (sr) ij = 1 if product i and j are matched to the same ingredient. We finish the validation stage by comparing the role assignments (of products) from both datasets, where l complement roles and l 1 substitute roles (from the recipe data) are obtained from applying community detection on W (cr) and W (sr) , respectively. We construct extra l 0 substitute roles by grouping together products that are matched to the same ingredients, for reference; see Fig. 3 for details.

4.1.
Illustrative example. Before investigating noisy real data, we first validate our overall framework in a controlled "ideal world" where the relationship between products is known. Specifically, we simulate a consumer population characterised by a set of rules in this world, and ask whether (a) An ingredient-recipe network.
(b) The weighted complement network of ingredients.
(c) The weighted substitute network of ingredients.
(d) The matching between ingredients and products.
(f) W (sr) . Figure 3. Illustration of the process to compute the product roles from the recipe data, where cyan squares are recipe nodes, orange circles are ingredient nodes, red circles are product nodes, the line thickness corresponds to how high the corresponding scores are, and l 0 substitute roles, l complement role(s) and l 1 substitute roles are shown as groups of product nodes in the purple dashed circles, green dashed polygon(s) and blue dashed circles, respectively.
our null models capture the right relationship between each pair of products, whether our measures give the right degree between them, and finally, whether our complement and substitute roles provide insights into the groups of complements, and the groups of substitutes, respectively.
The simulated world is summarised as follows, similar to the one in [12].
• There are 13 different products: coffee, wipes, ramen, candy, hot dog1, hot dog2, hot dog3, hot dog bun1, hot dog bun2, taco shell1, taco shell2, taco seasoning1, taco seasoning2. • coffee, wipes, ramen, candy are independent products, but are popular with the customers, so are bought frequently. This corresponds to one possible source of noise, correlated preference, where the items are preferred by some customers but purchase decisions are made independently from one another, based on their features, e.g. price. • The other products form substitute groups and complementary pairs. Products of the same names ignoring the number at the end are groups of substitutes; pairs in {hot dog1, hot dog2, hot dog3 }×{hot dog bun1, hot dog bun2 } and {taco shell1, taco shell2 }×{taco seasoning1, taco seasoning2 } are complementary pairs. In this world, customers never buy just one item in a complementary pair, and they always buy at most one of all such pairs.
A customer C intends to buy a pair in cp1 or cp2 What is the price of cps? C will buy a pair Both low P = 0.5 One lower P = 0.4

Will C buy a pair?
Both high P = 0.1 Which cp?
C will buy a pair in cp1 uniformly C will buy a pair in cp2 uniformly C will not buy a cp Which cp is cheaper?
Will C buy cheaper cp?
Will C buy cheaper cp? • Customers are sensitive to price. When the price of a popular product is low, they buy it with probability 0.8; otherwise, they buy it with probability 0.2. Each customer purchases each preferred product independently. Sensitivity to the price of complementary pairs is different, since the probability to purchase a pair will decrease even if only one item in the pair has a high price. Hence, each pair is treated as a whole here. When all complementary pairs are of low price, customers buy one of them evenly; the case when all pairs are of high price is similar, except that customers have a 0.5 chance not to buy any of them; when one of the pairs has a lower price than the others, they buy this one with probability 0.85, and have 0.15 probability to buy others evenly; see Fig. 4 for details.
With these specifications, we simulate 1000 transactions from this customer population. For a single transaction, each independent product has an 80% chance of being marked up to a high price; there is a 50% chance that all complementary pairs are of low price, a 10% chance that all are of high price, and accordingly a 40% chance that some are marked up, where the lowest priced one is chosen uniformly at random.
We provide the complementarity scores, W (c) , induced by the original measure, sim o , and by the randomised configuration measure, sim r , together with the number of co-purchases, (cn ij ), in Measures on the products, from co-purchases (cn ij ) whose diagonal shows the purchase frequency (left), the complementarity scores W (c) induced by the original measure (sim o (i, j)) (middle) and by the randomised configuration measure (sim r (i, j)) (right), where x-axis, y-axis are products in the same order as being listed in the simulated world assumptions.
the noise here 4 . Note that independent products are bought more frequently, and their numbers of co-purchases with other products are fairly similar to those within complementary pairs. However, our extracted complementary pairs {(i, j) : W (c) ij > 0} successfully retrieve the ground-truth complementary pairs. Accordingly, our extracted substitute pairs {(i, j) : W (s) ij > 0} successfully retrieve the ground-truth substitute pairs. Furthermore, the complementarity scores of the hot-dog-and-hot-dog-bun complementary pairs is between 0.3 and 0.5, and those of the taco-shelland-taco-seasoning complementary pairs is around 0.5. These values are approaching the inverse of the number of products in the corresponding substitute groups, which is consistent with the assuming complete substitution.
Finally, our substitute roles exactly agree with the ground-truth substitute groups; see This example demonstrates the ability of our overall framework to determine both product relationships and their corresponding degrees, which paves the way for us to continue the analysis on real-world data. From a mesoscale perspective, our complement roles and substitute roles have much overlap with the groups of complements and those of substitutes, respectively. Furthermore, the fact that we already have complement roles involving substitutes indicates that the interaction between the two relationships is not negligible. For instance, it is entirely possible that we may find substitute roles including complements in real data.

Sales data.
Hereafter, we use the variant of ER model as the underlying null model, since its assumptions are generally applicable in real-world purchases, and we only show the results from the original measure, because both have very similar behaviour; see Appendix C for the parameter calibration and the results from the randomised measure. We first examine the ranking power of our scores, W (c) and W (s) , by checking the top complementary pairs and substitute pairs for each product. This is done by choosing several query products j at random, and output the products of the three highest complementarity scores W (c) ij and the ones of the three highest substitutability scores W (s) ij ; see Table 1 for one run. The substitute pairs of scores > 0.1 largely agree with common sense 5 . For example in Table 1, the top substitute of organic blueberries is blueberries, and the top substitutes of salad tomatoes are other types of tomatoes. Additionally, the ranking indicates that common-sense substitutes have high complementarity scores with the same products. For example salad tomatoes, baby plum tomatoes and tomatoes on the vine are the top three complements of loose cucumbers. These findings justify our assumption 3 in Sect. 3.2. There are also some nontrivial substitutes, of lower score values, from general understanding, which we will discuss in Sect. 5.
We proceed for the mesoscale structure, i.e. the complement roles and substitute roles. From an averaged perspective, the complement roles and the substitute roles constitute assortative communities in the unipartite networks W (c) and W (s) , respectively; substitute roles form disassortative communities in W (c) ; see Fig. 7. The latter observation also justifies the assumption 3. Furthermore, the overlap between the two roles is not negligible, with the normalised mutual information (NMI [45]) 0.49. Hence, as mentioned in Sect. 4.1, substitutes may appear in the same complement   role by their strong complements, and complements may be assigned to the same substitute role for their strong substitutes.
Finally, we explore the internal structure of complement roles and substitute roles. Generally, strong complements 6 do not tend to form complete graphs in the complement unipartite network W (c) , where there are many products that are complements of the same products but are not complements of each other. For example, blueberries (Blueb) and organic blueberries (Or Blueb) in the complement role of berries (3) are substitutes, but both are complements of raspberries (Raspb), stawberries (Strawb), etc; see Fig. 8. There are also cases in which they constitute some complete graph, and further exploration indicates that these products are highly likely to be consumed together. For example, mushroom stir fry (Mushroom SF), vegetable and beansprout stir fry (V Beansprout SF), and egg noodles form a triangle in the complement role of stir-fry (9); see the blue polygon in Fig. 8.
Strong substitutes are expected to form complete graphs in the substitute unipartite network W (s) , and our results are largely consistent with the expectation. For example, loose Braeburn apples (LB Apples), loose Pink Lady apples (LPL Apples), and bagged organic Gala apples (BOrG Apples) constitute a triangle in the substitute role of apples (23); see the blue polygon in Fig.  9. Note this expectation is only valid if the substitutes are consumed for the same purpose; if this assumption is violated, seemingly substitute products may end up being complements. For example, loose brown onions (LBr Onions) and loose red onions (LR Onions) in the substitute role of onions (4) are both substitutes of products such as bagged red onions (BR Onions) and bagged organic brown onions (BOrBr Onions), but are complements of each other; see Fig. 9. The difference between their quantities and their common substitutes may be the key factor here. Likewise, even with the common substitute bagged organic Gala apples (BOrG Apples), loose ripe pears (LR Pears) is a complement of loose Pink Lady apples (LPL Apples), loose Braeburn apples (LB Apples) and loose Gala apples (LG Apples). The above observations confirms the complexity of the interaction between complements and substitutes. vegetables. There are also complement roles only involving one category, and the related products are either in fruit or in the prepared produce category. Further, this is in agreement with the notion that products in prepared produce, for instance prepared vegetables and vegetable dips, go well together; similar for products in fruit.
The proportion of L3 categories in each substitute role also accords with products being substitutes; see Fig. 10. Some of them only or mostly involve prepared produce, and some others largely consist of fruit, such as the substitute role of apples (23). This agrees with the tendency of grouping products into categories based on shared characteristics. Other substitute roles contain more than one category, with one of them being prepared produce. For example, the substitute role of grapes (5) includes both fruit and prepared produce, the substitute role of carrots (13) comprises both prepared produce and vegetables, the substitute role of peppers (19) involves prepared produce, salad 7 and vegetables, and the substitute role of avocado salad (25) is composed of fruit, prepared produce and salad. Further investigation shows that products in prepared produce include fresh-cut fruits, prepared salads and prepared vegetables, i.e. prepared versions of products in fruit, salad and vegetable categories.

4.3.2.
Flavour compounds and recipes. We observe that the substitute pairs have a significantly higher probability to share all their flavour compounds with each other, i.e, rf ij = 1, than all product pairs, while complementary pairs have a significantly higher probability to share no flavour compounds with each other, i.e, rf ij = 0; see Fig. 11. These characteristics are consistent with the functional definition of complements and substitutes: complements are consumed together, thus tend to have different flavours in order to accompany each other; while substitutes can replace each  Table 2. The Pearson correlation indicates that the product pairs of higher substitutability scores have a significant tendency to share larger portions of their flavour compounds, while the patterns when changing the complementarity scores is more heterogeneous, with a mild negative correlation between the ranking of the complementarity scores and that of the relative number of shared flavour compounds.
We then discern that the complementary pairs have higher probability to co-appear in relatively more recipes, {rr ij : W thus are more likely to be appear in the same recipe together; while substitutes can be used in place of each other, thus tend to be cooked together with some others but not each other.
Moreover, we examine the correlations between the relative number of shared recipes (rr ij ) and the score values; see Table 2. The Spearman correlation suggests that product pairs of higher rankings in the complementarity scores tend to co-appear in relatively more recipes, which agrees with the Pearson correlation. The trend when increasing the substitutability ranking of product pairs is a mild propensity towards co-appearing in relatively less recipes.
Additionally, we explore the correlations between our complementarity scores (W  Table 2. The Spearman correlations of both score pairs indicate significant positive relationships within each pair, which is consistent with the information suggested by their Pearson correlations. Table 3. NMI and AMI between the partitions by the roles from the sales data (columns) and those from the recipe data (rows), where "com" and "sub" correspond to the complement roles and the substitute roles, respectively. NMI/AMI l 0 sub l 1 sub l com com 0.54/0.16 / 0.36/0.28 sub 0.71/0.21 0.54/0.18 / Finally, we compare the complement and substitute role assignments from different data sources, in particular the sales and recipe data, where we use the NMI and adjusted mutual information (AMI [45]) to measure the consistency between role assignments; see Table 3. Our substitute roles (from the sales data) are more similar to the complete substitution, i.e. l 0 substitute roles obtained from the recipe data. Although our complement roles are more in agreement with the l 0 substitute roles than the l complement roles by NMI, this may be caused by the number of l 0 substitute roles being larger than that of l complement roles, since the AMI shows a significant opposite direction. To conclude, the relatively large NMI and AMI values demonstrate the consistency between the extracted product relationship from these two different sources, and also provide evidence that customers buy products corresponding to ingredients in particular recipes.

Discussion
Extracting complements and substitutes is part of the broad family of unsupervised learning problems, since the relationship between any pair of products is unknown [46] (see Appendix E for the detailed formulation). This makes the validation process ill-defined, as there is no ground truth. Hence in our study, not only do we compare the results with heuristic arguments based on common understanding of the product relationships, but we also resort to external data sources -the product hierarchy data, the flavour compound and recipe data. Since these datasets focus on different aspects of the products, this is a well-grounded validation process. The seemingly heterogeneous observations from such datasets are well-explained by the product relationships, and thus provide further validation of our results.
Our assumption that complements are products purchased together significantly more frequently could appear simplistic, because it does not explicitly exclude other factors that may result in co-purchases, e.g. correlated preference. However, from a network perspective, these effects are expected to be removed implicitly by the statistical tests associated with our null models. Moreover, we also propose a family of randomised measures to explicitly remove various sorts of noise effects. Compared with the state-of-the-art, another advantage offered by a network perspective is the definition of exact criteria to determine whether products are complements or substitutes. In this article, we have shown that both relationships can be effectively extracted from the simple notions of whether two products are purchased together significantly more frequently, or less frequently but share common strong complements (assumptions 1 and 3 in Sect. 3.2).
Once unipartite networks of products have been built, we may proceed from pairwise relationships to the mesoscale structure, via the notion of complement roles and substitute roles. The observations justify our assumption 3 that substitutes share common strong complements. They also indicate that complement products do not generally constitute complete graphs, while substitute products typically do, though such complete graphs can be destroyed, for example, by substitutes consumed for different purposes. These results demonstrate the possibility of the complement relationship to go beyond pairwise relationship, and also the complex interaction between complements and substitutes.
Finally, let us emphasise that we only use basket data to extract the product relationships, without additional information such as the customer profile and the price change, information that are typically required for existing methods and may cause privacy issues [47]. Our method to extract complements and substitutes is then solely based on sales data, as stated in the assumptions in Sect. 3.2. Hence, the quality of our results is dependent on the mutual information between the sales data (through our assumptions) and the criteria, where some discrepancy may exist. For example, there may be products that are not generally recognised as complements, but are purchased together significantly more frequently, so are treated as "complements" from the sales angle. However, most applications of product relationships are from a sales perspective, such as stocking shelves and marketing in sales promotions, and our validation further confirms the rationality of our extracted complements and substitutes.
For these reasons, we believe that the network-based approach is a promising research avenue within the field of retail. Among the research directions that this article has opened, an important one would be to consider the bipartite network from a temporal perspective, in order to explore further the connection between structure and cross-elasticity (see Appendix D). It would also be interesting to design a method that directly uncovers the degrees of complementarity and substitutability from the bipartite network, without any intermediary steps as it is done here, and to explore more of the directed scores, since our focus is on the symmetric ones here. Another is to characterise the products by their centrality in the projected networks, for instance by the average complementarity and substitutability scores of their relations. Moreover, our current analysis focused on fresh food where prices changed frequently throughout the period. Yet, we did not explicitly include price as a factor, but either ignored its bias or removed it by some random models [22]. In order to analyse a more general range of products in the future, it would be necessary to incorporate price information in our framework in a meaningful way.
Availability of data and materials. The flavour compound and recipe data is available from [14,49]. The other datasets generated and analysed during the current study are not publicly available due to the terms of use in Tesco PLC.
Competing interests. The authors declare that they have no competing interests.
Author's contributions. All authors read and approved the final manuscript.
We will use the approximation in the following analysis since the limit is commonly true in large networks. Note that the probability of an edge between two nodes only depends on their degrees if they are in different parts (0 otherwise).
Since BiCMs do not exclude multi-edges, it is then important to know how probable it is to obtain a multi-edge. Suppose node l and i are already connected, the probability of getting another edge between them is then the probability of an edge between a node of degree d (0) l − 1 and another of degree d (1) i − 1. Hence, the probability of obtaining at least two edges between l and i is d Suppose the processes to form multi-edges between every pairs of nodes are independent Bernoulli trials with possibly different probabilities, then the expected number of pairs with multi-edges is i }, and n z is the number of nodes involved. Hence the number of pairs having multi-edges stays constant as long as the moments are constant and finite, and will be negligible if the network is sufficiently large.
A.2. Common Neighbours. The common-neighbour pattern is important in characterising nodes in bipartite networks, hence we now consider the number of common neighbours between nodes i and j in part 1, cn (1) ij , and that between nodes h and l in part 0, cn (0) hl , follows naturally. For a node l in part 0, we know the probabilities of edges (l, j) and (l, i), but if l is already connected to j, the probability to also connect to i will be (d (0) l − 1)d (1) i /m. The probability of product nodes i, j sharing a transaction node l is then Since i, j can have any node in part 0 as their common neighbour, if we consider the whole process as n 0 independent Bernoulli trials with possibly different probabilities, cn ij is then a Poisson binomial random variable, with the expected value Similarly for h, l in part 0, cn hl is a Poisson binomial random variable with the mean value Hence the expected number of common neighbours is dependent on both the degrees of the two nodes, and the moments of the other part's degree sequence.
Finally, we evaluate the error bound values. For the pairs satisfying the condition, we use the tighter bound of D 2 ω ij where ω ij = l p 2 ilj /( h p ihj ). We find that the maximum value of ω ij s is around 0.021, with more than 97.2% pairs of ω ij ≤ 0.003 and more than 89.3% pairs of ω ij ≤ 0.001, thus the Poisson approximation is guaranteed to perform well. For those that do not satisfy the condition, we analyse the looser bounds 2λ ij ω ij and D 1 p * ij , where λ ij = l p ilj . We find that λ ij ω ij s are all larger than 2 (and p * ij > 0.25, as we know), thus the Poisson approximation could be misleading for these small number of pairs. Hence, we provide the comparison between the Poisson approximation and the Chernoff bounds, an alternative approximation method, in the following Appendix B.2, in order to show that the Poisson approximation has comparable performance in the above product pairs of worse bounds. Together with the guaranteed good performance for most pairs, these are the reasons to use the Poisson approximation in Sect. 3 Proof. By Markov inequality, for t ≥ 0, E[e tX ] is the moment generating function of a Poisson binomial variable, i.e. X = n i=1 Y i where Y i s are independent of each other, and Y i ∼ Bernoulli(p i ), thus, Since 1 + y ≤ e y , ∀y, Hence, where we choose t = log(x/( n i=1 p i )) = log(x/µ) which minimises the above upper bound w.r.t. t.
Similarly, the probability that a Poisson binomial distributed variable X gets small can be bounded by the Chernoff bound for the lower tail, where for 0 < x < µ, Proof. By Markov inequality, for t ≥ 0, Then the proof follows the same as the previous one, with t = log(µ/x). B.2.1. Accuracy. The focus of most literature is not on exploring the theoretical guarantee for the accuracy of the Chernoff bounds, but rather on finding better Chernoff-like exponential bounds. However, we provide the proof of the exact form we use here. The inequality comes from two sources: , ∀y, and here y = p i (x/E[X] − 1) for each composing Bernoulli probability p i . Accordingly, for the Chernoff bounds to be tight or exact, we need the following two conditions: (i) Z can only have positive probabilities at 0 or z, i.e. X only have positive probability at x, and the more concentrated its distribution is at the value, the tighter the Markov inequality; (ii) y = 0, i.e. x = E[X], the interesting value for comparison, x, is equal to the mean, E[X], since p i > 0, ∀i, and the closer x is to E[X] and/or each p i is to 0, the tighter the inequality.
In our analysis, the Poisson binomial random variable is the number of common neighbours between each pair of nodes i and j (in the same part) in BiCMs in Sect. 3.3.2, X ij = l X ilj with X ilj ∼ Bernoulli(p ilj ), and the value of interest is the actual number of common neighbours, cn ij . The purpose of using the Chernoff bounds is to test whether cn ij is significant. Hence, for condition (i), it would be hard for each X ij to have support only containing cn ij ; for condition (ii), the only possibility is from the p ilj s being sufficiently small, since cn ij should be far from the expected value to be significant. Hence, the Chernoff bounds are generally loose. The reasons why we consider the Chernoff bounds here are (i) to have valid upper bounds that can be evaluated efficiently, and (ii) that the relatively large estimated value can be balanced by slightly large significance values in our analysis. However, the lack of theoretical guarantee for the accuracy of the Chernoff bounds does make it less attractive from the theoretical angle.
B.3. Comparison: Chernoff Bounds versus Poisson Approximation. Here we compare the results from the sales transaction data obtained by applying our framework with the Poisson approximation and with the Chernoff bounds, in order to explore whether the Chernoff bounds provide similar results to the Poisson approximation, and also whether the Poisson approximation is comparable in the pairs of worse error bounds. Here, we choose the same reference significance level 0.05 in both cases, focus on the original measure, as in Sect. 4.2, and keep the same reference threshold quantiles 0, 0.7 for the degrees of complementarity and substitutability, respectively. Following our framework, a higher significance level is chosen for the Chernoff bounds, see Table 4.  Comparing the score values from the two methods, they mostly agree with each other, since the scatter plots approach the identity line; see Fig. 13 where each point is a product pair. The complementarity scores from the Poisson approximation have more nonzero values (952 out of all 31846 points 10 with x = 0, y > 0 in the left plot), where 36 of them are caused by the discrepancy in the approximated values (rather than the difference in thresholds), and 35 of them have p * ij := max l p ilj ≤ 0.25, which indicates that the Poisson approximation is reliable (see Appendix B.1). For the substitutability scores, there are 391 points with positive score values from the Poisson approximation but zero from the Chernoff bounds (i.e. with x = 0, y > 0 in the right plot), where 67 of them are due to the discrepancy between the approximated values (rather than carried from the difference in the complementarity scores, i.e. share no complements), and 66 of them have p * ij ≤ 0.25; all 194 points with positive score values from the Chernoff bounds but zero from the Poisson approximation result from the different choice of significance levels, α l . Hence, (i) the behaviour of the Poisson approximation largely agrees with the Chernoff bounds, but (ii) the discrepancy between them does exist, where, with a theoretical guarantee from Le Cam's theorem [28], we can show that the Poisson approximation is expected to mostly perform well.
The difference in scores does not cause a huge deviation in the role structure, with normalised mutual information (NMI) 0.43 and 0.83 and adjusted mutual information (AMI) 0. 30   measures are very close to each other, which is why they have exactly the same community structure in the parameter calibration phase, as mentioned before. Our method is robust to temporal shifts on the condition that the underlying customers maintain their purchase habits during different time periods, i.e. they keep treating certain products as complements or substitutes. It is not necessarily true for every time period, and from our industrial collaborators, customers do change behaviours over time. Hence, it is a promising direction to further incorporate the temporal features of our scores, e.g. how the scores change over time, as mentioned in Sect. 5.
For our current work, it is also interesting to explore how the scores change within the threemonth period we chose. Hence, (i) we split our three-month sales transaction dataset into two one-month-and-a-half datasets; (ii) we then compare the results from the two datasets, in terms of score values and roles. Note that we now have a shorter time period, thus noise will play a relatively larger role in the analysis. Hence, we further filter our products to be sold at least once a week, which brings our analysis down to 169 products, and 19498, 19670 transactions in splits 1, 2 respectively. As in Sect. 4.2, we use the variant of ER model as the underlying null model, and compare the results from the original measure, with the same choice of reference significance level (0.05 in both cases) and reference threshold quantiles (0, 0.7 for the degrees of complementarity and substituteability, respectively).  Table 6. Parameters chosen for our methods on splits 1 and 2.
In general, it is true in both complementarity and substitutability scores that if the values are larger than 0 in both splits of the data, they are close to each other, see x = 0, y > 0, or y = 0, x > 0, i.e. the relationship is significant in one split but not the other. Since the same significance levels are chosen for both splits (see Table 6), the insignificance from the other split stems from the change in customer behaviour. Note that the substitutability scores are computed through the complementarity scores, thus there is also part of the points lying in either x = 0 or y = 0 obtained from sharing no complements 11 , which further caused by the changing behaviour of customers.
Further to quantify how far the score values in one split from the other, we define the relative distance D between the scores {X (1) ij } in split 1 and {X (2) ij } in split 2 to be, ij ) , and n p is the number of product nodes. Here, we use the arithmetic mean, mean(x, y) = (x + y)/2. The resulting relative distance of the complementarity and the substitutability scores are 0.13 and 0.17 respectively, thus both are relatively small.
Finally, we compare the roles extracted from both splits. The complement roles from the two splits have NMI 0.74 and AMI 0.62, which indicates that the complement role structure from the two datasets largely agree with each other. While for the substitute roles obtained from the two sources, they have NMI 0.58 but AMI 0.16, where the difference between the two values is from the relatively large number of roles. Hence, there is slightly more change in the substitute role structure, which is expected since it also carries the changes from the complement relationship.
To conclude, in our chosen period, the temporal shifts will cause a certain amount of change in the score values, but will not cause dramatic change in the mesoscale role structure. Hence, as we mentioned in the very beginning, we find analysing the temporal change of our scores an interesting topic to explore in the future. and similarly, P(X j = 1|X i = 0, X j = 1) = P(a l j = 1|a l i = 0, a l j = 1) = P(a l j = 1, a l j = 1) − P(a l j = 1, a l i = 1, a l j = 1) P(a l i = 0, a l j = 1) .
Then, to find the corresponding sets, we restrict set J i.e. product i and j are bought together significantly less frequently with significance level α l , but more frequently with some products in I Next, we propose the measures to estimate the conditional probabilities. Based on random walks on networks, the estimated value of (8) is taken to be the original directed measure, where ∀j ∈ I We also modify the conditional probability to provide a symmetric version P m (j, i) = P(a l j = 1, a l i = 1) P(a l i = 1)P(a l j = 1) .
The randomised versions are proposed in order to remove the specific noise effect from our estimates.
The step to restrict the set of possible items is particularly important for estimating conditional probability (9), since we will remove the condition X i = 0, and instead approximate P(a lj = 1|a lj = 1), ∀j ∈ I (c) i , in an integrated way. The measure we propose directly for this purpose is the directed substitutability measure, where ∀j ∈ I j ∩I (c) i min(P(a l j = 1|a l j = 1),P(a l i = 1|a l j = 1))P(a l i = 1|a l j = 1) p∈I (c) iP (a l i = 1|a l p = 1) 2 .
Note theP(·|·) in Equation (12) is an estimate of the probability in (8), and we can consider, for example, estimates (10) and (11). We also modify the normalisation method to propose a