Corporate payments networks and credit risk rating

Letizia, Elisa; Lillo, Fabrizio

doi:10.1140/epjds/s13688-019-0197-5

Regular article
Open access
Published: 01 June 2019

Corporate payments networks and credit risk rating

EPJ Data Science volume 8, Article number: 21 (2019) Cite this article

6140 Accesses
21 Citations
4 Altmetric
Metrics details

Abstract

Aggregate and systemic risk in complex systems are emergent phenomena depending on two properties: the idiosyncratic risk of the elements and the topology of the network of interactions among them. While a significant attention has been given to aggregate risk assessment and risk propagation once the above two properties are given, less is known about how the risk is distributed in the network and its relations with its topology. We study this problem by investigating a large proprietary dataset of payments among 2.4M Italian firms, whose credit risk rating is known. We document significant correlations between local topological properties of a node (firm) and its risk. Moreover we show the existence of an homophily of risk, i.e. the tendency of firms with similar risk profile to be statistically more connected among themselves. This effect is observed when considering both pairs of firms and communities or hierarchies identified in the network. We leverage this knowledge to show the predictability of the missing rating of a firm using only the network properties of the associated node.

Assessing the aggregate risk emerging in complex systems is of paramount importance in disparate fields, such as economics, finance, epidemiology, infrastructure engineering, etc. A large body of recent literature has explored, both theoretically and empirically, how risk propagates [1] and how to assess aggregate risk when the risk of each individual entity is known [2], as well as the topology of the network of interaction among them. Although both aspects have been shown to be important, their mutual relation is relatively less explored. In theoretical studies, one typically assumes independence between idiosyncratic risk and topology, while in empirical studies the correlation is the one present in the investigated dataset.

But what is the relation (if any) between the idiosyncratic risk of a node and its local topological properties (e.g. degree, centrality, community, etc)? In this paper we answer this question by studying a specific system where the assessment of aggregate risk is particularly important, namely the network of interaction between firms. Assessing the risk of firms is one of the fundamental activities of the credit system. Banks spend a significant amount of resources to scrutinise the balance sheet of firms in order to obtain accurate estimations of their riskiness, the internal rating, and provide credit conditions reflecting both the capability of the firm to repay the loans and its probability of default. The riskiness of a firm depends on many idiosyncratic factors (e.g. balance sheet, structure of management, etc.) as well as the industrial sector or its geographical location [3,4,5]. However, corporate firms do not live in isolation, but interact with each other on a daily basis. The interactions can be of different kinds, including those due to the supply chain, payments, business partnerships, financial contracts, and mutual ownership. The structure of interactions is complex and multifaceted, but its knowledge is critical both for macroeconomists and for the credit and banking industry to understand the dynamics of the economy, the business cycle, the structure of corporate control, and, of course, the risk of firms (in isolation or in aggregation).

Here we study the interplay between the risk of firms and the interlinkages connecting them. The network is built from a large proprietary dataset provided by a major European bank. The dataset contains the payments collected at daily granularity between more than two million Italian firms together with the information on internal risk rating for a large fraction of them. We want to understand whether and in which measure a firm’s role in the network can be informative of its riskiness. This is important for two reasons. First, even if the risk of a firm is not known to all the counterparts, it may affect its ability to interact with other firms. For example, a poor rating (i.e. high riskiness) may prevent the access to credit and as a result it may cause a reduction or delay in payments toward suppliers. If the supplier has high risk, the missing or delayed payment can prevent its own payments, increasing the likelihood of a cascade of missing payments and a propagation of financial distress. The second reason is that, in certain cases, the knowledge of the riskiness of a firm or of a group of firms is lacking or imprecise. In these cases, the existence of a correlation between network properties and risk can allow or improve the assessment of risk. Indeed, in the last part of the paper we will show how network properties of a node can be used to predict the risk of the corresponding firm.

Previous works on networks of firms focussed mainly on ownership relations [6,7,8,9,10], or dealt with the theoretical modelling of other types of relation [11]. Exceptions are the empirical studies on the Japanese economic firm-to-firm network [12, 13], where links represent buyer-supplier relationship. In other cases, as in the seminal paper [14], even if the theoretical framework applies to single firms, the empirical part focuses on the aggregate, sector network, due to lack of more granular data. The use of payments as a proxy of interactions between economic entities is not new and has been investigated mainly for banks [15,16,17,18,19] in the context of systemic risk studies, where, however, other choices to characterise interactions are possible [20,21,22,23]. Apparently much less is known about the payment network between firms, mostly because of lack of data. Concerning rating prediction, there is a vast literature mainly considering the problem as a classification task [24]. The idea of employing machine learning techniques in credit rating scoring has been explored before [25, 26], but in these cases the predictors for the rating are all derived from balance sheets, so the results are not comparable with ours. Other works use more heterogeneous information to predict the rating [27,28,29,30,31].

This paper contributes to these streams of literature in several aspects. First, we investigate the topological properties of payment networks by considering standard network metrics, such as degree and strength distribution and components decomposition. We find that the large payment networks investigated in this paper share the properties observed in other complex networks, namely they are sparse but almost entirely made of a single component, they are scale free and small world. Then, we look into the distribution of risk of firms in the network of payments in order to quantify the dependence between the network property of a node or a group of nodes and the risk of the firm represented by the node(s). The main and most innovative contribution of this paper is to document the existence of such correlations. We find an homophily of risk, i.e. the tendency of a firm to interact with firms with similar risk. This is a two nodes property, but a similar behaviour is observed, even more clearly, also at larger aggregation scales. Communities of firms, detected by using different methods, often display a statistically significant abundance of firms of a specific risk class, indicating the tendency of firms with similar rating to be linked together through payments. Risk is therefore not spread uniformly on the network, but rather it is concentrated in specific areas. This implies that an idiosyncratic shock on a single firm can propagate more or less quickly depending on the local network structure and the community the node belongs to. The last contribution, is to exploit this correlation between risk of a firm and network characteristics of the corresponding node to predict the risk rating of the firm using network properties alone. To this end, we employ machine learning techniques to build classifiers for risk rating whose inputs are only network properties (e.g. degree, community, etc.). We show that our classification method has a good performance both in terms of accuracy and of recall and that outperforms significantly random assignments.

1 The network of payments

1.1 The dataset

The investigated dataset contains information on payments between more than two million Italian firms and is built from transactional data of the payment platform of a major European bank^{Footnote 1} Transactions are registered with daily granularity for the year 2014, for a total of 47M records, each of which includes the two counterparts involved, date, type, amount, and number of transactions in the same day. Transactions are originally identified by account, but in case of customers and former customers, multiple accounts associated to the same firm are aggregated into a single entity.^{Footnote 2} This results in a total of 2.4M entities (which will be referred to as firms, for brevity) operating through the platform during the whole investigated period. The firms can be of different types: customers, who have an account in the bank, non customer, and former customers. There is also a small residual class on NA, which we aggregated with the non customer class. More information on the frequencies of the different classes is available in Appendix 1.

In principle, any firm or public body can make use of the platform, but in practice in most cases at least one is a customer of the bank. Similar considerations hold for the total amount exchanged: in each month more than 50% of the volume is transferred between customers, and it rises to above 95% when considering transaction with at least one customer involved. More details on the dataset and some descriptive statistics is presented in Appendix 1. Finally, for a large fraction of customers, the dataset contains information on the economic sector and on the internal rating of the firm on a three value scale: Low (L), Medium (M), and High (H) risk.

1.2 Networks definition and basic metrics

A network, or graph, is identified by two sets: V, the sets of nodes with cardinality $\lvert V \rvert =n$, and E, the sets of links or edges, with cardinality $\lvert E \rvert =m$. The latter is the collection of ordered pairs of connected nodes. In our case, we also take into account the strength of interactions so a weight $w_{ij}$ is associated with each link. Starting from transaction data, payment networks are constructed as follows: given a time window, each node represents a firm active in that period; if there is payment between two firms a link from the source to the recipient is added, with weight equal to the payment amount. If multiple transactions occur between the same (ordered) pair of nodes, the weight of the link is the sum of the amounts of the payments. Therefore for each time period we construct a directed and weighted network. The time window of analysis may vary depending on the type of information one wants to extract from the dataset. In the following, the focus will be on monthly networks, for which results are quite stable, at the cost of dealing with fewer and larger graphs. For the period covered by the dataset, each monthly network consists on average of $n=1$M nodes and $m=3.2$M links with the lowest activity in August and the highest in July (see Appendix A.1). The density $\rho =\frac{m}{n(n-1)}$ is thus small, resulting in a so called sparse network. Nevertheless this low density does not imply a disaggregated system. Indeed for all the monthly networks the diameter is very small compared to the size: on average across the months, starting from a node one has to pass at most 19 links to reach any other node in the weakly connected component (see Table 1). Thus the networks have the so called small-world property.

Table 1 Basic metrics of the network of payments

Corporate payments networks and credit risk rating

Abstract

1 The network of payments

1.1 The dataset

1.2 Networks definition and basic metrics

1.3 Networks topology

2 Risk distribution and network topology

2.1 Degree and risk

2.2 Assortative mixing of risk

2.3 Network organisation and risk

2.3.1 Modular structure

2.3.2 Hierarchical organisation

2.4 Discussion

3 Missing rating prediction using payments network data

4 Conclusions

Notes

Abbreviations

References

Acknowledgements

Availability of data and materials

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Appendices

Appendix 1: Dataset and network metrics

1.1 A.1 The dataset

1.2 A.2 Time aggregation

1.3 A.3 Network metrics

Appendix 2: Risk distribution

2.1 B.4 Degree and risk

2.2 B.5 Assortativity of risk

2.3 B.6 Test for risk distribution within a community

Appendix 3: Classification

3.1 C.7 Data pre-processing

3.2 C.8 Models training and hyper-parameter optimisation

Rights and permissions

About this article

Cite this article

Share this article

Keywords