The retail market as a complex system
© Pennacchioli et al.; licensee Springer 2014
Received: 23 June 2014
Accepted: 2 December 2014
Published: 11 December 2014
Aim of this paper is to introduce the complex system perspective into retail market analysis. Currently, to understand the retail market means to search for local patterns at the micro level, involving the segmentation, separation and profiling of diverse groups of consumers. In other contexts, however, markets are modelled as complex systems. Such strategy is able to uncover emerging regularities and patterns that make markets more predictable, e.g. enabling to predict how much a country’s GDP will grow. Rather than isolate actors in homogeneous groups, this strategy requires to consider the system as a whole, as the emerging pattern can be detected only as a result of the interaction between its self-organizing parts. This assumption holds also in the retail market: each customer can be seen as an independent unit maximizing its own utility function. As a consequence, the global behaviour of the retail market naturally emerges, enabling a novel description of its properties, complementary to the local pattern approach. Such task demands for a data-driven empirical framework. In this paper, we analyse a unique transaction database, recording the micro-purchases of a million customers observed for several years in the stores of a national supermarket chain. We show the emergence of the fundamental pattern of this complex system, connecting the products’ volumes of sales with the customers’ volumes of purchases. This pattern has a number of applications. We provide three of them. By enabling us to evaluate the sophistication of needs that a customer has and a product satisfies, this pattern has been applied to the task of uncovering the hierarchy of needs of the customers, providing a hint about what is the next product a customer could be interested in buying and predicting in which shop she is likely to go to buy it.
Keywordsmarketing complex systems nestedness
The retail market has been one among the most successful application scenarios for data mining research. Supermarkets generate a large amount of data each day, by recording which customers are buying which products, where and when. Traditional statistics tools have been abandoned, as unsuitable tools for dealing with such data richness, in favour of association rule mining , data clustering , OLAP techniques for business intelligence ,  and other approaches. The common strategy shared by these tools is to segment, separate and profile diverse groups of consumers. Their typical result is to find unexpected pairwise relationships between products, or group together some customers given their purchase behaviour or personal data. We call this class of results ‘local patterns’, as they typically involve specific groups of customers/products, and they proved their usefulness in many real world scenarios –.
There are alternative approaches to the analysis of other types of markets. In ,  the global export market at the country level is modelled as a complex system. Rather than focusing on local patterns, the authors looked for a global pattern emerging from the self-organization of competing actors. Under such perspective, many fluctuating and unpredictable local behaviours can be interpreted as adjustments happening at a higher level. The world export market, then, ceases to be unpredictable and a global pattern emerges. Exploiting this new knowledge of the market as a complex system, authors are able to define the new concept of ‘Economic Complexity’ and prove that this measure is a very accurate predictor of a country’s future growth, outperforming any other traditional socio-economical indicator. This approach is very successful and it has been replicated elsewhere .
In this paper, we introduce the idea of analysing the retail market as a complex system. Our approach is based on the observation that the retail market is composed by independent units, the customers, which act accordingly to their internal logic, the maximization of their utility function. By putting together these interacting units, the system of retail market starts showing properties of its own, as a result of the self organization of the customers. This approach has the potential to overcome some severe limitations of the classical retail market data mining. For instance, the output of association rule mining is usually composed by thousands rules, each describing a single particle of the customer behaviour, and selecting the most representative ones is usually a problem . Moreover, usually many products are not present in the result set, as they are not frequently purchased, causing this description to be incomplete. On the downside, we forfeit the high granularity and precision achievable with data mining techniques.
By looking at the retail market as a complex system, we are able to define the Purchase Function, which is a description of the mechanics of this complex system at the global level. The Purchase Function enables us to enhance our knowledge about the system as a whole, describing both customers and products, and we prove its usefulness in three different analyses. First, we provide one empirical observation of Maslow’s hierarchy of needs . Using the Purchase Function we discover that highly ranked customers, with more sophisticated needs, tend to buy niche products, i.e., low-ranked products; on the other hand, low-ranked, low purchase volume customers tend buy only high-ranked product, very popular products that everyone buys.
Second, we propose a simple marketing application useful for targeted advertising. Given that the Purchase Function classifies the likelihood of a customer-product connection, a target marketing campaign may spot with a higher accuracy the smallest customer set that is likely to start buying a given product.
Finally, our third application is focused on the predictability of customer movements on the territory. We aim at predicting in which shop a customer will go to buy a given product. We show how the typical low-level information about the product (its price or its usual purchase amounts) have some explanatory power. However, our customer/product sophistication measure, derived from the Purchase Function, has a much greater explanatory power.
Our applications are founded on a data-driven empirical proof. We analyse a unique transaction database, collected by a retail supermarket chain in Italy, which recorded the micro-purchases of a million customers. Each customer is recognizable as the system records her purchases using the identification code of her membership card. We are then able to track the purchases of each customers over a four year period, from 2007 to 2011.
In our analysis, we build the adjacency matrix of the bipartite network connecting a customer to the products she buys. This matrix has a triangular shape, consistent with the observations of the global export market , . We prove that this shape is not expected, by implementing a simple null model of customer behaviour and observing that the fundamental properties of the observed matrix structure are not present in the null model matrix. Therefore, the observed system is indeed the product of a complex interaction not reducible to simple assumptions. We then proceed to define the Purchase Function, which divides the adjacency matrix in expressed and not expressed connections. This is the global pattern of the system and we can exploit it in our application scenarios.
2 The data
An important dimension of the data warehouse is Marketing, representing the classification of products: it is organized as a tree and it represents a hierarchy built on the product typologies, designed by marketing experts of the company. The top level of this hierarchy is called ‘Area’ that split the products into two fundamental categories: ‘Food’ and ‘No Food’. The bottom level of the hierarchy, the one that contains the leaves of the tree, is called ‘Segment’ and it contains different values. Hence, for each item contained in the dataset, there is an entry assigning it to the right path of the hierarchy tree. The ‘Store Kind’ column refers to the shop classification: in increasing order of size we have ‘Gestin’, ‘Super’ and ‘Iper’. ‘Gestin’ are usually low area shops, occupying the ground floor of a building, usually in the city center and in smaller towns and villages. ‘Super’ are larger, usually occupying their own building and built into larger cities just outside the city center. ‘Iper’ are usually an Italian equivalent of US malls.
Distribution of the number of products per category
Seasonal & DIY
We generated different views of the dataset for different purposes. Our main dataset is Livorno2007-2009, that is including all the purchases of the customers located in the city of Livorno during the period from 2007 to 2009. We use only this view for the applications of the framework’s output. We also generated the dataset Lazio2007-2009 (same period, different geographical location, the union of the cities of Rome, Viterbo, Latina, Rieti and Frosinone) and Livorno2010-2011 (different period, same geographical location). The two views are generated to prove that the fundamental properties of the adjacency matrix needed for our framework are not bounded to a particular place or time. The following steps of data preparation are applied equally to the different datasets extracted.
The second issue, as introduced above, regards the cardinality of products. There is a conceptual problem in using the level of detail of ‘item’: the granularity is too fine, making the analysis impractical as it would consider a very low detail level. The distinction between different packages of the same product, e.g. different sizes of bottles containing the same liquid, is not of interest in our study. A natural way to solve this problem is to use the marketing hierarchy of the products, substituting the item with its marketing Segment value. In this way, we reduce the cardinality of the dimension of the product by 98% (from to ), aggregating logically equivalent products.
The last step in data selection is to exclude from the analysis all segments that are either too frequent (e.g. the shopping bag) or meaningless for the purchasing analysis (e.g. discount vouchers, errors, segments never sold, etc.). After this last filter, and consequently the discharge of the customers that bought exclusively products classified under the removed segments, we got the adjacency matrix, the input to our framework. Livorno2007-2009 matrix has customers and segments, with purchases; Livorno2010-2011 has customers and segments, with purchases; and Lazio2007-2009 has customers and segments, with purchases.
The methodology is implemented in a three-step process: (i) pre-process, where the data about customer purchases is transformed in a format suitable for our analysis; (ii) analysis, where we calculate the Purchase Function and the Product/Customer Sophistication; (iii) validation, where through a null model we evaluate the significance of the descriptors. We now proceed describing these three steps, starting from the pre-process.
The first step of our methodology is to pre process the connections between customers and products. This operation is carried on the adjacency matrix of the bipartite network customer-product. We are interested in showing that the best sold products are bought by all customers, while products with a low market share are bought exclusively by customers who buy everything. To highlight this pattern, we sort the matrix with the following criterion: fixing the top-left corner of the matrix M as the origin, we sort the customers on the basis of the sum of the items purchased in descending order (the top buying customer at the first row and so on), and the products with the same criteria from left to right (the best seller product at the first column and so on). In this way, at the cell we find the quantity of best seller product purchased by the top buying customer.
The final step of data preparation is to binarize the matrix, by identifying which purchases are significant and which are not. We cannot simply binarize the matrix considering the purchase presence/absence of a customer for a product. A matrix with a 1 if the customer purchased the product and 0 otherwise will result in a certain amount of noise: it takes only a single purchase to connect a customer to a product, even if generally the customer buys large amounts of everything else and the product is generally purchased in larger amount by every other customer.
where is the number of bought by , is the total number of products bought by , is the total number of times has been sold and is the total number of products sold.
This is the final output of the preprocess phase, hence from now on it will be referred as the purchase matrix , and is the entry of of row j and column i.
We can observe in Figure 6 the phenomenon we expect given our assumptions: only a small amount of popular products are bought by everyone, but smaller sets of customers purchase the rest of the products (going from the right to the left columns). The same set of big buyers are always part of these smaller and smaller sets.
In this phase, the aim is to obtain the global and local descriptors of the complex system of retail. For the global level, we define the function connecting the volume of sales of products with the set of customers buying them and the volume of purchases of customers with the set of products they are buying. At the local level we perform an evaluation of how much a product, and a customer’s need, is basic or sophisticated. We start with the global level and then we proceed describing the local level.
3.2.1 Global descriptor
Customer behaviour is not random: as we have seen there are many studies dealing with the problem of finding correlations between products frequently bought together . However, here we strengthen this assumption as follows: these correlations are actually organized following a general function that regulates retail purchases. In other words, we are not dealing with a set of correlations limiting their effects on two or three products. There exists a general pattern, meaning that it is possible to define the set of products bought by a customer as a function of the amount of products she buys. We call this the Purchase Function.
The Purchase Function states that the assortment of products bought by any given customer is determined by ’s volume of purchase, and the population of customers that buy any given product is determined by ’s volume of sales. More precisely, we indicate it as a function, that relates the rank of products with the rank of customers, where the rank i of a product (or j for customer ) stands for the fact that is the i th highest sold product (or is the j th customer with the largest volume of purchases). In practice, looking at the matrix in Figure 6, the function is the equation of the line dividing the area with the high density of ones (colored in red) from the rest.
For any customer we denote and, for any product , . The Purchase Function is assumed to be a decreasing monotonic function, i.e., implies that , which in turn implies that . In other words, if is a customer purchasing more in terms of product quantities than , then it is very likely that buys the same set of products buys, plus something more.
Matrices with triangular structures have been already studied in ecology literature. In ecosystems, simpler organisms are ubiquitous and more complex organisms appear iff simpler organisms are already present . In these works, authors define nestedness as a measure to understand how much triangular is the structure of the matrix representing the connections between species and ecosystems. The nestedness is calculated by identifying the border dividing the matrix in two areas containing respectively most ones and most zeroes, that is exactly the role of the Purchase Function. In this literature, this is known as isocline.
In literature there are several algorithms tackling the problem of computing the isocline of a matrix . The general approach is usually made in two steps: a reordering of the rows and columns of the matrix, such that the ones tend to be clustered in the upper-left corner of the matrix; and an estimate of the isocline function on the reordered matrix.
In our framework we are implementing an alternative way to calculate the Purchase Function (isocline). We have chosen to do so for two reasons. First, all algorithms explicitly reorder the matrix. We do not want to reorder our matrix, since the order we defined in the pre-process is a fundamental prerequisite for the purchase function, as it has been defined above to connect the ranks of customers and products calculated on their volumes of purchases and sales, respectively. These ranks are obtained by the matrix ordering during the preprocessing stage and thus this order cannot be modified. Secondly, the state-of-the-art algorithms are designed to deal with ecology data, with a number of cells in the order of 104 or 105. Since our cells are ∼109, we need to define a new procedure, enabling the application of our framework to large datasets. We described the specifications of this methodology in a previously published technical report .
where counts the number of cells at the left of the isocline in where we expect to find the ones (and counts the ones) and where counts the number of cells at the right of the isocline in where we expect to find the zeroes (and counts the zeroes). In practice, we take the average of the one-density at the left and zero-density at the right of the isocline. We used this measure because simply counting unexpected presences and absences of ones at the right/left of the isocline is not a fair measure, being our matrix very sparse.
We now need to find the isocline. To find it, we estimate where the isocline should pass to maximize the division of ones at the left and zeroes at the right. We consider our matrix as a Cartesian space. For each discrete x axis value (customer) we get an estimate of where the isocline should pass (y axis). We do so by summing the ones of the corresponding matrix row (). Then, for each discrete y axis value (product) we get an estimate of where the isocline should pass (x axis). We do so by summing the ones of the corresponding matrix column (). We average these two values and we obtain a pair of coordinates. This procedure is linear in the number of customers and products and therefore it can scale with very big matrices. We fit these coordinates using a non-linear least squares optimization with the Levenberg-Marquardt algorithm  to obtain the best function able to represent the isocline and, therefore, the Purchase Function.
The for the different shapes tested
ax + b
The for the different views of the dataset
Null model average
For the Livorno2007-2009 dataset, the value of the parameters has been estimated as: , , , . The corresponding isocline has been plotted in black in Figure 6. We do not report the values of the parameters for the other datasets as we are not using them in the rest of the paper.
3.2.2 Local descriptor
As for the local descriptor, we quantify the sophistication level of the products sold and of the needs of the customers buying products. The basic intuition is that more sophisticated products are by definition less needed, as they are expression of a more complex need. One may be tempted to answer to this question by trivially returning the products in descending order of their popularity: the more a product is sold, the more basic it is. However, this is not considering an important aspect of the problem: to be sold to a large set of costumers is a condition to be considered ‘basic’, but it does not fully describe the term. Another condition is that the set of customers buying the product should include the set of costumers with the lowest level of sophistication of their needs. The conjunction of the two properties is closer to define a product as ‘basic’.
This conjunction is not trivial and it is made possible by the triangular structure of the adjacency matrix. Consider Figure 6: the columns in the right part of the matrix are those customers buying only few products. Those products are more or less bought by everyone. In a world where our theory does not hold, instead of buying the products at the top row of the matrix they would buy random products.
For this reason, we need to evaluate at the same time the level of sophistication of a product and of the needs of a customer using the data in the purchase matrix, and recursively correct the one with the other. We adapt the procedure of , adjusting it for our big data.
We note in the last formulation is satisfied when and this is equal to a certain constant a. This is the eigenvector which is associated with the largest eigenvalue (that is equal to one).3Since this eigenvector is a vector composed by the same constant, it is not informative. We look, instead, for the eigenvector associated with the second largest eigenvalue. This is the eigenvector associated with the variance in the system and thus it is the correct estimate of product sophistication.
where is the eigenvector of associated to the second largest eigenvalue, normalized as described above; is its average and its standard deviation. The Customer Sophistication CS is calculated using the very same procedure, by estimating instead of .
The aim is to maximize the impact of countries with low complexity in dragging down the complexity of the products made mostly by them. There are upsides and downsides to each measuring choices, and this case is not different. However, the measure proposed in  is highly correlated with our choice, as shown in . Therefore, in the context of this paper, there is no reason to prefer one measure over the other, and we make the choice of using only one for clarity and readability.
A selection of the more basic products according to their PS values
Natural still water
Yellow nectarines (peaches)
Semi-skimmed fresh milk
A selection of the more sophisticated products according to their PS values
LCD 28”/30” televisions
DVD music compilations
The triangular structure of the matrix in Figure 6 gives an important information: a customer that purchased few products is expected to have bought just products that are best sellers. This disagrees with the expected presence of ‘cherry pickers’, i.e. customers that are particularly sensible and responsive to sales, especially if the sales are placed on expensive goods. Instead, looking at Figure 6, we expect customers to follow a general pattern.
Starting from this consideration, we need to validate the model, in particular we want to control that the triangular structure is meaningful. We need a null model definition with which to compare our theory. We identify three important features that our null model must hold: (1) the purchases are distributed randomly; (2) customers must preserve the total amount of their purchases; and (3) each product must preserve its sale volume on the market. The implementation of the null model is reported in the Appendix.
In the previous section we have defined our methodology to extract the general pattern governing customer behaviour, by analysing the adjacency matrix of the bipartite structure connecting the customers to the products they are buying. In this section, we apply our methodology to real world data. We firstly describe the nature of our data. We then move on to describe the data selection policy. Then, we apply the framework, obtaining the global descriptor in the form of the Purchase Function, and the local descriptor, i.e. the sophistication levels of customers and products. Finally, we provide our three analyses: the empirical observation of Maslow’s hierarchy of needs ; the marketing application for targeted advertising; and finally the evaluation of predictability of customer movements on the territory.
4.1 Data-driven hierarchy of needs
In this section we want to use the information provided by the Product Sophistication index to reconstruct the hierarchy of needs of the supermarket customers, and therefore provide an empirical observation of the theory of Maslow .
Three caveats need to be specified. First: we are not claiming that this hierarchy of needs is universal. The result we are presenting in this paper has been reached with data from one city of Italy (Livorno) and therefore it describes the hierarchy of needs of that particular city. However we showed that the triangular structure of the purchase matrix is present even in different areas of Italy (Figures 7 and 8) and therefore our framework could be applied to different world regions, helping to create a picture of different hierarchies of needs. The comparison of hierarchies of needs of different cities and the evaluation of different cultural perspectives of customers over their needs is left as future development. Further, this hierarchy is a valuable marketing tool for that particular city: products at the basis of the hierarchy are more needed, thus no marketing strategies are required for them as they will be sold anyway.
The second caveat is that we built the hierarchy of needs using the product category classification defined by the supermarket owners. To use this classification introduces the bias of a set of people, with a given culture and marketing aims. We plan to use for future developments standard product classifications.
Lastly, a collection of customers could be buying some classes of products in different shops, thus unfairly pushing up their sophistication. While this effect is considered to be small due to the high customer fidelity and the all around service provided by the supermarket, some of the products at the top of our hierarchy of needs could be over-represented.
With this caution in mind, we now build the hierarchy. To build the hierarchy we need to divide products in classes according to their PS value. Formally, we need to segment the PS values, previously sorted. We decided to perform a one-dimensional clustering using the ck-means algorithm. ck-means is an evolution of the k-means algorithm which guarantees the optimality of clustering . The k-means problem is to partition data into k groups such that the sum of squared Euclidean distances to each group mean is minimized. ck-means is optimized to operate on one dimensional data, which is our application setting. In this setting ck-means find the optimal cluster separation, which is unique and therefore a repeatable result, properties that standard k-means does not hold.
Figure 10 is clear depiction of what are the priorities in the mind of the customers of Livorno. Figure 10 is telling some expected and some unexpected things. First there are the basic needs: drinking and eating, particularly fruit, vegetables, bread and meat. Then, there are more sophisticated eating products and what is needed to take care of the body hygiene. At the middle of the hierarchy we start to have product not strictly necessary for survival: house cleaning and simple products for the free time. The two most sophisticated needs are schooling, entertainment (both for children and adults), more complex garnishment; and, climbing at the top of the pyramid, newborn childcare and unnecessary equipment. The basis of the pyramid is expected: most basic needs are food and personal hygiene. Up until now we have basic confirmation about human needs. The top of the pyramid is instead telling us something surprising. Traditionally, reproduction is considered one of the most basic needs of any living thing. However, what we see is that in our modern society to have a baby ends up being one among the most sophisticated needs, and the first one to be dropped, even before having a pet.
4.2 Data-driven marketing insights
We now describe a possible targeted marketing strategy based on the outputs of our framework. Suppose the supermarket wants to promote a product and it wants to limit its target to the smallest subset with the highest probability of buying the product advertised. The Purchase Function can be used in the following way: given the amount of products bought by customer we use its index j to obtain the index of the most sophisticated product that is buying. With this information, we can derive the set of products she is expected to buy, that is . is defined as all the products that have an index . The same applies considering as input a product , we obtain the index delimiting the set of customers buying it (for which ).
One concern needs to be addressed before continuing: how well is the Purchase Function dividing the ones from the zeros in comparison to what we expect? How much is a customer more likely to buy a product following the Purchase Function evaluated on our real world data () over any random product (P)?
As previously reported, the Livorno2007-2009 matrix contains ∼37 millions ones out of ∼1.5 billions cells. This means that, given a random product and a random customer , the baseline probability that customer is buying product in a significant amount (i.e. ) is the ratio of these two numbers, or . If we consider only the portion of the matrix at the left of the calculated isocline, i.e. the area of the matrix for which tells us that the customers are very likely to buy exactly that products, we count 16,748,048 ones and 60,025,000 total cells. Thus, the probability for a customer to buy significant amounts of a product for which (i.e. ) is 27.9%. Using the Purchase Function , we can narrow of two orders of magnitude the set of combinations of products and customers to analyse and still capturing almost half of the significant purchases. In other words, customers are 11.43 times more likely to buy a product if i is lower than, or equal to, the index limit predicted by the Purchase Function. We refer to this ratio as , i.e. the -based probability of connecting customer with product over the baseline probability. We also calculated the same ratio, this time by counting at the right side of the isocline, where we expect to find many zeros. The number of ones is 37 millions minus 16 millions, and it is divided by the number of cells, 1.5 billions minus 60 millions. The probability of obtaining a one is 1.39%, less than one twentieth of the left side of the isocline.
Now that we have addressed the main concern about the Purchase Function, we can safely assign to product a corresponding customer index that is its current ‘border’. All indexes represents customers who buy product (i.e. , ), while the indexes are customers not buying . By definition, the higher the value of , the more unlikely is the customer buying . Thus, the set of customers the law is suggesting to target is the one immediately after index j. Since is an interpolation, it is safe to define a threshold . Then, we define the set TC, the target customers set, as the set of all customers for which, given their index , it holds: and (the last condition is necessary to exclude from TC all customers who are already buying large quantities of product , as it is useless to advertise to them).
The probabilities of buying product in general ( ) and given that a customer already buys product ( )
The comparison between the size of the target customer sets identified by the Purchase Function against random target customer sets with the same number of customers likely to buy
4.3 Predicting customer mobility
To explain customer mobility is one of the successes of this framework. The full study of the application has been published in , without the framework formalization. We report here the results to prove the usefulness of this framework. Customer mobility has been shown to be rather predictable on long time scales . In , authors show that it is possible to model the overall mobility behavior of customers. More than showing the predictability of customer movements as in , we focus in one of the possible causes of it.
Products with the same price are bought by customers placed at different distances from the shop. Given a price, we average the distance travelled by the customers buying the products with that exact price. By averaging, we lose the ability of describing each single customer and we just describe the behaviour of the system in its entirety. We do so because the single customer is bounded by the place where she lives, thus each single customer carries a noisy information, and we can make sense of it only by looking at the global level.
From Figure 11 we can conclude that price plays a role in driving customer decisions of travelling a given distance for a product. The correlation here looks weak, but positive: customers travel more if they need to buy a more expensive product. We calculate a log-linear regression4using the function . In this regression, (, with ), meaning that we can explain 17.25% of the variance in the distance travelled using the price.
To check if the frequency of purchase can explain the distance travelled by customers, we repeated the same analysis, using the number of purchases of a product instead of the price. We depicted the plot in Figure 12. The correlation here is negative: the more frequently a product needs to be bought, the smaller the distance a customer will travel for it. We calculate a regression with the function and we obtained (, with ).
These tests confirm that the price plays a small role in predicting the distance a customer will travel for purchasing a product, by increasing it. If a product is needed more frequently then it drives (down) the distance a customer will travel to buy it, regardless of the price. However, there is a large amount of variance that remains unexplained.
We propose that our Product and Customer Sophistication indexes have, in this case, higher explanatory power. The intuition is that if a product satisfies a more sophisticated need (and the customer has those needs) then the customer is willing to travel farther to purchase the product. To test this hypothesis, we generate the same plots created for price and frequency of purchase, using our computed indexes. The plots are depicted in Figures 13 and 14.
In Figure 13, we test the relationship between the distance travelled and the Customer Sophistication: we calculate the average distance travelled by customers (y axis) to get to the shop against their sophistication value (x axis). In this case, the x axis has not a logarithmic scale, as the relationship is linear. We can see that the relationship between distance travelled and customer sophistication looks non-linear. From a value of sophistication of 0 to around 0.2 the relationship is negative, while it is clearly positive afterwards. We speculate that this effect could be driven by the fact that customers with lower sophistication could live on average further from the shops for many reasons (they prefer living outside the city, they are in poorer areas of the city, etc.). However, to test this speculation is outside the scope of this paper and we leave it as future work.
For this reason, we move on in depicting the Product Sophistication (x axis) against the average distance travelled by the customers to purchase the given product (y axis) in Figure 14. In this case, the relationship is clear: the more a product is sophisticated, the more customers will travel to buy them. The product sophistication has a normal distribution, but less sophisticated products are more sold, given the triangular shape of the matrix. This fact explains why most of the data points are in the left part of the plot: most purchases are generated for low sophistication products. We calculated a linear regression, for which (, with ). This is more than twice higher than the obtained with the purchase frequency, explaining much better the variance in the distances travelled by customer.
In  we address possible objections such as the influence between distance and number of products bought, which may invalidate the effect of the Product Sophistication. We also show that the average Sophistication of different shop types (we recall that there are three, in decreasing order of size and Sophistication: iper, super and gestin) influence the average distance of their customers. For compactness, we point to that paper for this additional material and we conclude this section by remarking that the average sophistication of the products in a shop is influencing customers’ decisions: when they need a more sophisticated product they are prone to decide to go to a larger shop with higher sophistication.
In this section we firstly place this paper in the context of marketing research literature, especially in the field of data mining. We then briefly review the strong and weak points of this study. Finally, we conclude the paper, summing up contribution and future works.
5.1 Results in context with previous literature
This work is a complementary approach to the classical data mining task of the association rule mining. In data mining, association rule mining is a tool developed to find correlations between the appearances of products in shopping carts . Association rule algorithms are able to uncover the most frequent and interesting rules by efficiently cutting the search space (or even without ). Recently, many step forwards have been proposed in association rule mining as mining multidimensional rules . Our work differs from the ones presented as it is not focused on finding all the particular rules in a transactional dataset, but in exploring the general pattern characterizing it as a whole. This pattern can also be used to design better heuristics for the classical association rule mining algorithm, since it unveils novel relationships among products.
There are also works that aim to use association rule mining to obtain a general picture of the system . However, also in this case our approach is different. In , only the associations between products are considered, leaving the customers undescribed. Then, the general picture in  is based on the aggregation of the local patterns, while in our work we employ a complementary approach, creating the general picture by analysing the entire set of transactions as a complex system, expressing properties at the global level that are not necessarily given by the sum of the properties at the local level. To sum up, while  employs a bottom-up approach, we employ a top-down approach. We employed a similar approach in previous work , by studying the effects of different community discovery approaches in analysing the complex network of product associations.
Other relevant literature dealing with the problem of extracting knowledge from customer behaviour can be found in business intelligence. In this field, many data mining and OLAP techniques have been developed, enriching the analytic tools , , not only for marketing purposes but also to detect frauds , or for public health surveillance . Data mining and customer behaviour has gone also one step forward, by exploiting sentiment analysis as a prediction tool for a product success/failure .
Our approach is a combination of application and evolution of some tools present in literature. First, for some specific tasks our framework makes use of the Revealed Comparative Advantage (RCA) measure. The RCA measure has been defined in international economics , but the very same concept has been borrowed in many fields. For example, the RCA measure is equivalent to the lift. Lift (as conviction, collective strength and many more) is one criterion used in association rule mining to evaluate the interestingness of a rule .
Second, we make use of concepts related to ecology literature  and macro economics , . While using similar techniques (as the eigenvector factorization of the customer-product matrix to calculate the sophistication levels of both customers and products), our work differs from the ones presented on two axis: quality and quantity of data. As for the quality of the data, we deal with micro purchases instead of macro world trade or ecosystem presence/absence of animal species. As for the quantity of data, we work with matrices with a number of cells ∼109 while related works do not scale beyond ∼105 and therefore cannot be used in our scenario.
Our analysis of customer mobility has been designed and performed also in a data-mining oriented scenario, in previous work . For that paper, we also publicly released our anonymized data, for result verification purposes.5
5.2 Strength and limitations of this study
To the best of our knowledge, this is the first study applying the logic of complex system theory to the retail market. This is the main strength of the paper, because it empowers researchers and market analysts with a new way of thinking this field of study. Many classical results from complexity theory can now be applied to this scenario, and the universe of testable hypotheses has been enlarged.
We backed up this claim by showing three applications. We showed that it is possible to have a data-driven large scale observation of the hierarchy of human needs. Previously, this theory could only be tested in very bounded cases. Moreover, we uncovered some aspects of the logic of customer behaviour. We did so limiting our attention to their movements on the territory. As a result of our analytic vantage point, we could discover that their mobility is more predictable than previously thought. We are able to predict part of the variance in their movements just by knowing what types of products are sold in different supermarkets of an area.
There are many limitations in the study here presented. Even if we partially controlled from time and space, by creating alternative views of our dataset from different regions in space and time, we still have a biased view of customer behaviour. In fact, our entire study is confined inside the cultural environment of Italy. This makes our empirical hierarchy of needs biased towards what are the basic and sophisticated needs for the Italian people. Moreover, we used the internal marketing classification of the supermarket under study to redact our hierarchy. This is another source of bias that can be fixed by using data from other countries, as well as an international standard product classification such as SITC6or HS.7
As a second limitation, a deeper understanding of the mechanics of the purchase matrix could be a promising future work of this paper. One could define a null model using the Maximum Entropy Principle  and test whether the results shown in the paper still hold.
Thirdly, in the definition of the Purchase Function, we did not consider the number of parameters as a penalty for the functional form. It is not surprise, then, that the function with more parameters fits the data better. As future work, we will include penalties for the number of parameters and test whether the current shape of the function still provides the best results.
Also the mobility study is influenced by the technology level of Italy. Countries with better, or worse, infrastructure might show different patterns.
In this paper we analysed large quantities of data extracted from the retail activity of the customer subset of an Italian supermarket chain. Our aim was to build a framework able to exploit a different vantage point over retail purchase data. We highlighted some properties of retail data, namely uneven distributions of connections in the customer-product bipartite structure and the triangular structure of its adjacency matrix. These properties make association rule mining results incomplete. By looking at the retail data as a complex system, as we did in this paper, we can develop an alternative and complementary methodology to analyse purchase data.
Our thesis is that customers usually buy the same set of basic products and the more sophisticated products are only bought by customers buying everything, generating a triangular adjacency matrix for the bipartite customer-product network. Our framework is able to analyse this structure as a whole, instead of looking at the local patterns like classical rule mining, uncovering the general pattern of shopping behaviour. Building on this theory, we define a the Purchase Function that can identify the set of customers buying a specific product by looking simply at how much the product is sold (and vice versa); and a way to rank the sophistication level of both products and customer needs. We showed some possible applications of these results: a data driven empirical observation of Maslow’s theory of needs; an efficient way to identify a small set of potentially very interested customers for a given product ; and a way to predict customer mobility on the territory.
Our work opens the way to several future developments. The first one concerns the validation of our observation of the hierarchy of needs, as it is based on a narrow geographical set of people and on a non-standard product category classification. Also, with more data we can extend our pyramid of needs to fully cover the entire spectrum of human needs. Another interesting track of research may be to investigate what is the minimum time window needed to observe the prerequisites of the Purchase Function, maybe linked with the cyclic behaviour of customers  and/or with the stability of customer and product ranking order in the matrix . Another application scenario may be to fully exploit the purchase matrix as a complex system: to analyse products not only based on their product sophistication index, but by looking at the product-product relationship level; or to try to find the way of controlling the complex system .
Appendix 1: Experimental setting
The analysis presented in this paper are performed with regular user-end computers. No mainframes or parallel computing techniques have been used. The fit of the Purchase Function , the marketing analysis and the computing of Product and Customer Sophistication via eigenvector calculation have been performed each one in less than one hour on a Dual Core Intel i7 64 bits @ 2.8 GHz laptop, equipped with 8 GB of RAM and with a kernel Linux 3.0.0-12-generic (Ubuntu 11.10), using a combination of Octave, Numpy and Scipy Python libraries. The data preparation pipeline, and null model generation and evaluation, have been computed on a Quad Core Intel Pentium III Xeon @ 2 GHz, equipped with 8 GB of RAM and with Windows Server 2003, using Java 1.6. The most memory and time consuming operation was the null model generation: each null model required 6 GB of memory and 4 hours of computing. The conclusion is that our framework is able to scale and to analyse large data quantities.
Appendix 2: Null model
We use two sets (PLeft and CLeft) to keep track of the rows and columns that are not yet full: customers that have not yet reached their amount of products bought and products that have not yet reached their diffusion among the customers. Vector R (C) keeps track in each cell of the respective residual in the row (column). The integer NItemsLeft contains the total number of purchases.
We start from an empty matrix, with the same dimensions as our real data matrix and with all cells initialized at 0. We iterate until we have a product left to place, i.e. as long as . At each iteration we randomly extract a position from the set of cells that are still increasable (stored in CLeft and PLeft). At this point, we just increase by 1 the value of the cell extracted, we decrease the residual of the row and the column selected (in R and P) and of the total number of purchases (NItemsLeft). Finally, we check if the column (row) selected has been filled and, in this case, we remove the column (row) index from the set Pleft (CLeft). After building this null adjacency matrix, we calculate the RCA for each cell, applying the pre-process step of our methodology. We obtain a null matrix and we can then confront it with the original one to understand if they are similar or not (and therefore if the shape of the original matrix is meaningful or not).
The news of the study, in Italian, can be found at http://www.viasarfatti25.unibocconi.it/notizia.php?idArt=6527. The PI of the study can be reached at firstname.lastname@example.org.
Also note that, for some reason, ‘Chemicals’ such as band aids or rat poison are classified under ‘Food’, although we advise not to eat these things.
This happens because the matrix is subject to the Perron-Frobenius theorem. To be applicable, the theorem has two requirements: the matrix must be aperiodic and irreducible. Being symmetric, satisfies the aperiodicity requirement. We also make use only of the largest giant component of , which implies that has only one component too, and thus satisfies the irreducibility requirement.
This and all other regressions have been calculated with the leastsq function of the SciPy module for Python.
We gratefully thank Muhammed Yildirim, César Hidalgo, Jenny Zambon and Sebastian Bustos for their support and useful discussions. We thank the supermarket company Coop and Walter Fabbri for sharing the data with us and allowing us to analyse and to publish the results. This work has been partially supported by the European Commission under the FET-Open Project n. FP7-ICT-270833, DATA SIM.
- Agrawal R, Imielinski T, Swami AN: Mining association rules between sets of items in large databases. SIGMOD international conference, Washington, D.C. 1993, 207–216.Google Scholar
- Sun Y, Aggarwal CC, Han J: Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proc VLDB Endow 2012, 5(5):394–405. 10.14778/2140436.2140437View ArticleGoogle Scholar
- Chaudhuri S, Narasayya VR: New frontiers in business intelligence. Proc VLDB Endow 2011, 4(12):1502–1503.Google Scholar
- Kocakoç ID, Erdem S: Business intelligence applications in retail business: OLAP, data mining & reporting services. J Inf Knowl Manag 2010, 9(2):171–181. 10.1142/S0219649210002541View ArticleGoogle Scholar
- Brauckhoff D, Dimitropoulos X, Wagner A, Salamatian K: Anomaly extraction in backbone networks using association rules. IEEE/ACM Trans Netw 2012, 20(6):1788–1799. 10.1109/TNET.2012.2187306View ArticleGoogle Scholar
- Marinica C, Guillet F: Knowledge-based interactive postmining of association rules using ontologies. IEEE Trans Knowl Data Eng 2010, 22(6):784–797. 10.1109/TKDE.2010.29View ArticleGoogle Scholar
- Montella A: Identifying crash contributory factors at urban roundabouts and using association rules to explore their relationships to different crash types. Accid Anal Prev 2011, 43(4):1451–1463. 10.1016/j.aap.2011.02.023View ArticleGoogle Scholar
- Hidalgo CA, Klinger B, Barabási AL, Hausmann R: The product space conditions the development of nations. Science 2007, 317(5837):482–487. 10.1126/science.1144581View ArticleGoogle Scholar
- Hausmann R, Hidalgo C, Bustos S, Coscia M, Chung S, Jimenez J, Simoes A, Yildirim M (2011) The atlas of economic complexity. Boston, USAGoogle Scholar
- Caldarelli G, Cristelli M, Gabrielli A, Pietronero L, Scala A, Tacchella A (2011) Ranking and clustering countries and their products; a network analysis. , [http://arxiv.org/abs/arXiv:1108.2590]Google Scholar
- Davis WL IV, Schwarz P, Terzi E: Finding representative association rules from large rule collections. SDM 2009, 521–532.Google Scholar
- Maslow AH: A theory of human motivation. Psychol Rev 1943, 50(4):370–396. 10.1037/h0054346View ArticleGoogle Scholar
- Bascompte J, Jordano P, Melián CJ, Olesen JM: The nested assembly of plant-animal mutualistic networks. Proc Natl Acad Sci USA 2003, 100(16):9383–9387. 10.1073/pnas.1633576100View ArticleGoogle Scholar
- Almeida-Neto M, Guimarães P, Guimarães PR Jr., Loyola RD, Ulrich W: A consistent metric for nestedness analysis in ecological systems: reconciling concept and measurement. Oikos 2008, 117: 1227–1239. 10.1111/j.0030-1299.2008.16644.xView ArticleGoogle Scholar
- Pennacchioli D, Coscia M, Giannotti F, Pedreschi D (2013) Calculating product and customer sophistication on a large transactional dataset. Technical report cnr.isti/2013-TR-004Google Scholar
- Marquardt DW: An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 1963, 11(2):431–441. 10.1137/0111030MathSciNetView ArticleGoogle Scholar
- Hidalgo CA, Hausmann R: The building blocks of economic complexity. Proc Natl Acad Sci USA 2009, 106(26):10570–10575. 10.1073/pnas.0900943106View ArticleGoogle Scholar
- Cristelli M, Gabrielli A, Tacchella A, Caldarelli G, Pietronero L: Measuring the intangibles: a metrics for the economic complexity of countries and products. PLoS ONE 2013., 8(8): 10.1371/journal.pone.0070726Google Scholar
- Guidotti R (2013) Mobility ranking - human mobility analysis using ranking measures. University of PisaGoogle Scholar
- Wang H, Song M: Ckmeans.1d.dp: optimal k -means clustering in one dimension by dynamic programming. R J 2011, 3(2):29–33.Google Scholar
- Pennacchioli D, Coscia M, Rinzivillo S, Pedreschi D, Giannotti F: Explaining the product range effect in purchase data. 2013 IEEE international conference on big data 2013, 648–656. 10.1109/BigData.2013.6691634View ArticleGoogle Scholar
- Krumme C, Llorente A, Cebrián M, Pentland A, Egido EM (2013) The predictability of consumer visitation patterns. CoRR. , [http://arxiv.org/abs/arXiv:abs/1305.1120]Google Scholar
- Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman JD, Yang C: Finding interesting associations without support pruning. ICDE 2000, 489–500.Google Scholar
- Nguyen K-N, Cerf L, Plantevit M, Boulicaut J-F: Multidimensional association rules in Boolean tensors. SDM 2011, 570–581.Google Scholar
- Chawla S: Feature selection, association rules network and theory building. J Mach Learn Res 2010, 10: 14–21.Google Scholar
- Pennacchioli D, Coscia M, Pedreschi D (2014) Overlap versus partition: marketing classification and customer profiling in complex networks of products. In: Workshop of the international conference of data engineering (ICDE)Google Scholar
- Li H: Applications of data warehousing and data mining in the retail industry. Proceedings of ICSSSM’05: 2005 international conference on services systems and services management 2005.Google Scholar
- Gabbur P, Pankanti S, Fan Q, Trinh H: A pattern discovery approach to retail fraud detection. KDD 2011, 307–315.Google Scholar
- Wagner MM, Robinson JM, Tsui F-C, Espino JU, Hogan WR: Design of a national retail data monitor for public health surveillance. J Am Med Inform Assoc 2003, 10(5):409–418. 10.1197/jamia.M1357View ArticleGoogle Scholar
- Castellanos M, Dayal U, Hsu M, Ghosh R, Dekhil M, Lu Y, Zhang L, Schreiman M: LCI: a social channel analysis platform for live customer intelligence. SIGMOD conference 2011, 1049–1058.Google Scholar
- Balassa B: Trade liberalization and ’revealed’ comparative advantage. Manch Sch 1965, 33: 99–123. 10.1111/j.1467-9957.1965.tb00050.xView ArticleGoogle Scholar
- Geng L, Hamilton HJ: Interestingness measures for data mining: a survey. ACM Comput Surv 2006., 38(3): 10.1145/1132960.1132963Google Scholar
- Bousquet N: Eliciting vague but proper maximal entropy priors in Bayesian experiments. Stat Pap 2010, 51(3):613–628. 10.1007/s00362-008-0149-9MathSciNetView ArticleGoogle Scholar
- Shen Z-JM, Su X: Customer behavior modeling in revenue management and auctions: a review and new research opportunities. Prod Oper Manag 2007, 16(6):713–728. 10.1111/j.1937-5956.2007.tb00291.xView ArticleGoogle Scholar
- Schich M, Lehmann S, Park J: Dissecting the canon: visual subject co-popularity networks in art research. ECCS2008 2008.Google Scholar
- Liu Y-Y, Slotine J-J, Barabási A-L: Controllability of complex networks. Nature 2011, 473(7346):167–173. 10.1038/nature10011View ArticleGoogle Scholar
- Patefield WM: An efficient method of generating random RxC tables with given row and column totals (algorithm AS 159). J R Stat Soc, Ser C, Appl Stat 1981, 30: 91–97.Google Scholar
This article is published under license to BioMed Central Ltd.Open Access This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.