 Regular article
 Open access
 Published:
Cryptocurrency coinvestment network: token returns reflect investment patterns
EPJ Data Science volume 13, Article number: 11 (2024)
Abstract
Since the introduction of Bitcoin in 2009, the dramatic and unsteady evolution of the cryptocurrency market has also been driven by large investments by traditional and cryptocurrencyfocused hedge funds. Notwithstanding their critical role, our understanding of the relationship between institutional investments and the evolution of the cryptocurrency market has remained limited, also due to the lack of comprehensive data describing investments over time. In this study, we present a quantitative study of cryptocurrency institutional investments based on a dataset collected for 1324 currencies in the period between 2014 and 2022 from Crunchbase, one of the largest platforms gathering business information. We show that the evolution of the cryptocurrency market capitalization is highly correlated with the size of institutional investments, thus confirming their important role. Further, we find that the market is dominated by the presence of a group of prominent investors who tend to specialise by focusing on particular technologies. Finally, studying the coinvestment network of currencies that share common investors, we show that assets with shared investors tend to be characterized by similar market behaviour. Our work sheds light on the role played by institutional investors and provides a basis for further research on their influence in the cryptocurrency ecosystem.
1 Introduction
Since the introduction of Bitcoin in 2009 [1], the cryptocurrency market has experienced bewildering growth, surpassing an overall value of one trillion dollars in early 2021. Beyond private investors, the development of the market was fostered by cryptocurrency hedge funds and Venture Capital (VC) funds, with institutional investments in cryptocurrencyrelated projects reaching an estimated amount of 17 billion US dollars in 2021 [2, 3].
A growing number of traditional financial firms and investment funds in Europe and the U.S. are also exploring avenues for investments in cryptocurrency via different channels, including, but not limited to, including cryptocurrency into their portfolios, investing through tokenization in equity of blockchain companies, and exploiting more regulated tools such as crypto futures, options, and ETFs [3, 4]. Unfriendly regulations, high volatility, and lack of reliable valuation tools, amongst other issues, have so far hindered widespread adoption and institutionalisation of these assets [3, 5, 6]. Most cryptocurrency platforms, for instance, lack regulatory and supervisory oversight concerning trading, disclosure, antimoney laundering, and consumer protection measures, forming what has also been described as a “shadow financial system” [7]. Nonetheless, recent challenging events affecting the economy and markets, i.e., the U.S. elections, Brexit in Europe, and the global pandemic, have gradually accelerated the uptake [3]. Despite these developments, the effects of institutional investments on the cryptocurrency market are still little understood, also due to the lack of comprehensive quantitative data.
Moreover, it has recently been flagged that the participation of institutional investors in both crypto and traditional markets might lead to potential spillovers and increased contagion risks between traditional finance and decentralised finance (DeFi)^{Footnote 1} [4]. Understanding the behaviour of institutional investors and its effect on the structure and evolution of the cryptocurrency markets is therefore of paramount importance to quantify the mutual impact between DeFi and traditional entrepreneurial finance [4, 8].
This paper aims to study the link between institutional investments and cryptocurrencies’ market trends systematically and quantitatively, exploiting a novel combination of data sources on a larger sample of cryptocurrencies. Our analysis exploits network science tools to study the structure and evolution of the coinvestment network, i.e., constructed as an undirected network of cryptocurrencies (nodes) connected if they share a common investor. In particular, we aim to tackle the following two main research questions: (i) Do connections in the coinvestment network reflect intrinsic similarities (e.g., in terms of technology or use cases) between cryptocurrencies? (ii) Is the coinvestment network related to cryptocurrencies’ market dynamics? First, we investigate the connection between the coinvestment network structure and various features of cryptocurrencies, such as their supported blockchain protocols and use cases. Then, we examine the relation between the coinvestment network structure and the correlation between the market behaviour of pairs of tokens measured in terms of correlations of their returns (i.e., the percentage changes in their prices over time).
The article is organised as follows: in Sect. 2, we provide an overview of the relevant literature; in Sect. 3, we describe how the data was collected and integrated and the methodologies and algorithms employed for this study; in Sect. 4.1, we describe the coinvestment network and study how the cryptocurrency features (e.g., type of blockchain protocol, use case) are related to the network structure; in Sect. 4.2 we study the connection between the structure of the coinvestment network and market properties of different assets. In Sect. 5, we conclude.
2 Related work
Our work contributes to the literature on (i) characterising cryptocurrency market dynamics, (ii) constructing optimal portfolios of currencies, and (iii) quantifying and characterising institutional investments in cryptocurrencyrelated projects.
A growing body of literature has so far focused on the properties of the rapidly evolving crypto market ecosystem, shedding light on critical aspects such as assessing market efficiency and maturity [9, 10], detecting and characterising asset pricing bubbles due to endogenous and exogenous events [11, 12]. The dynamics of competition between currencies [13, 14], and the impact of collective attention [15] have also been closely analysed. Given the digital and decentralised nature of crypto assets, a major focus has been to understand the drivers of price fluctuations and how to properly value these assets. Studies using empirical data have focused on understanding and predicting the price dynamics of cryptocurrencies using machine learning techniques with different input features [15–20]. Socioeconomic signals, such as sentiment index gathered from social media platforms [21, 22], also appear to be strongly intertwined with the price dynamics [23, 24]. Research has also shown that movements in the market can be tied to macroeconomic indicators, media exposure, and public interest [25, 26], policies and regulations [27], and indeed the behaviour of other financial assets [28].
In the context of institutional investments, the recent growing interest in mixed portfolios of crypto and traditional assets [4] has paved the way to research looking at optimal portfolio allocation strategies. Studies have focused on the composition of mixed portfolios, i.e., including traditional (bonds, commodities, etc.) and crypto assets [29, 30], and cryptoonly portfolios [31, 32] testing the performances of different allocation and rebalancing strategies. Specific strategies, e.g., introducing socalled stoploss rules, have been tested specifically as they would make crypto portfolios more appealing to institutional investors due to lower risks associated with volatility [33].
Concerning characterising and quantifying institutional interest and investments in cryptocurrency projects, most of the research available is based on qualitative surveys by private companies of investors in Europe and the U.S., which aim to identify market trends and issues, e.g., barriers to adoption and current channels to exposure in cryptocurrencies [3, 4]. In Sun, 2021 [34], for instance, the authors surveyed 33 Asian firms to investigate whether price volatility lowers institutional investors’ confidence and to quantify the role played by the familiarity of investors with the technology in the selection of crypto assets. In [35] the authors analysed the connection between investors’ ESG preferences and crypto investments exposure using householdlevel portfolio data gathered from the Austrian Survey of Financial Literacy (ASFL). The analysis suggests that crypto investments are more strongly driven by social and ethical preferences compared to traditional investments (e.g., bonds). In [7], the authors analyse the drivers of crypto adoption, and assess institutional investors’ crypto exposure via different channels (e.g., banks, exchanges, etc.). In [36], a comprehensive review of typical crypto investors’ behaviour and their effect, including understanding drivers of investors’ sentiment and attention and detecting herding behaviour. In [37] the authors provide a first quantitative exploration of the investor’s network focusing on data for investments on ∼300 ERC20 tokens.^{Footnote 2} Their analysis shows that less central tokens in the investment network have also low market capitalization (i.e., the overall dollar value of all the tokens) and trading volume, poor liquidity, and high volatility. Our analysis builds directly on their approach, by considering an extended set of cryptoassets, as well as a novel combination of data, which also includes information on the technological features of the assets considered.
3 Data and methods
3.1 Data description
In this paper, we use three main data types, (i) cryptocurrency price time series data, (ii) cryptocurrency metadata describing projects’ technological features and/or their use case and functionalities, and (iii) data capturing information on investment rounds in cryptocurrency projects.
Market data (i) and cryptocurrency metadata (ii) were extracted from the website Coinmarketcap [38]. The data covers 1324 cryptocurrency projects over eight years, spanning from 2014 to 2022. It is important to note that the term ‘cryptocurrency’ here encompasses various types of blockchainbased digital assets. This includes traditional cryptocurrencies like Bitcoin and Litecoin, which are standalone digital currencies operating on their own blockchains, and blockchainbased tokens, such as the previously mentioned ERC20 tokens on the Ethereum blockchain and analogous tokens on other platforms. These tokens have a range of applications, and they can represent various assets or functionalities within decentralized applications. A notable example within this group is stablecoins, which are typically designed to minimize price volatility by being pegged to more stable assets such as fiat currencies.
Market data consists of each cryptocurrency’s opening price, closing price, and traded volume, sampled weekly.
Coinmarketcap also assigns tags describing the main features of the different cryptocurrencies. Metadata can be broadly classified into three categories. The first is technologyrelated specifications, which refer to the underlying blockchain technology that the cryptocurrency employs (e.g., ProofofWork vs. ProofofStake algorithms–these are different methods used to validate transactions and create new blocks in the blockchain). The second is ecosystemrelated information, indicating whether the cryptocurrency operates on an independent blockchain or as part of an existing one, as well as whether it is part of decentralized finance (DeFi) projects. The third category relates to the use case, or the specific purpose and utility of the cryptocurrency (e.g., it could be used for facilitating distributed storage, as a fan token for a particular brand or celebrity, or simply as a digital store of value, like digital gold). See Appendix A.5 for a list of available tags used to categorize these aspects and their respective frequency. The dataset contains 226 unique tags. Cryptocurrencies’ tags might change over time as, for instance, the project pivots its scope or new categories are invented. Thus, the data we collected and used in the analysis should be understood as a snapshot of the cryptocurrency environment at the time they were gathered (August 2021).
Coinmarketcap also provides cryptocurrencies’ webpage URLs, which are used to merge marketrelated data with investment data.
Finally, the investments’ data (iii) is gathered from Crunchbase [39], a commercial database covering worldwide innovative companies and accessed by 75M users each year. The data is sourced through two main channels: an extensive investor network and community contributors. Investors commit to keeping their portfolios updated to get free access to the dataset. More than 600k executives, entrepreneurs, and investors update over 100k company, people, and investor profiles per month. Crunchbase processes the data with machine learning algorithms to ensure accuracy and scan for anomalies, ultimately verified by a team of data experts at Crunchbase. Due to its broad coverage, the data has been used in thousands of scholarly articles and technical reports [39, 40]. Information on Crunchbase includes an overview of the company’s activities, number of employees, and detailed information on funding rounds, including investors and—more rarely—amounts raised. We provide detailed information on the features contained in this dataset in Appendix A.4.
We merged the Crunchbase data on investment rounds with Coinmarketcap data via the companies’ webpage URLs. After merging, the dataset includes 4395 investments made in 1458 rounds by 1767 investors to 1324 cryptocurrency projects appearing on Crunchbase. The total investments amount to \(\$13B\) US dollars in the period considered (2008–2022). When merging with the time series data, we can still track 624 cryptocurrency projects.
3.2 Methods
In this section, we review the methods used for our analyses. We first describe the coinvestment network and the approach we used to cluster its nodes. Later, we explain our analysis of the interplay between the network structure and the market dynamics.
Coinvestment network
The main object considered in our study is the cryptocurrencies’ coinvestment network. Figure 1, A shows how the coinvestment network is constructed as a monopartite projection of the bipartite network where investors are connected to cryptocurrency projects they have funded at least once. In the resulting coinvestment network (Fig. 1, B)—which is unweighted and undirected—nodes represent different cryptocurrencies, and the presence of a link means that the two nodes share at least one common investor. Figure 1C, shows the real coinvestment network composed of 624 cryptocurrency projects. The node sizes are proportional to their degree, and the link widths are proportional to the number of common investors between two cryptocurrencies. In the rest of this paper, the coinvestment network will be characterised by a binary and symmetric adjacency matrix A, with entries \(a_{ij}\in \{0,1\}\), recording only the information on whether at least one shared investor exists between two cryptocurrencies.
Clustering algorithm
We assign a vector \(\mathbf{x}_{i}\) to each cryptocurrency, where, for every tag j, \(x_{i,j} = 1\) if the jth tag (see Table 6) is assigned to the ith cryptocurrency, and \(x_{i,j} = 0\) otherwise. We used the Ward Aggregative Clustering [41] algorithm to divide the cryptocurrencies into different clusters based on the observations \((\mathbf{x}_{1}, \mathbf{x}_{2}, \ldots, \mathbf{x}_{n} )\). The algorithm uses a “bottomup” approach: each observation is initially placed in its own clusters, and clusters are merged sequentially according to some criterion until the desired number of clusters is reached. Wards’ algorithm specifically prescribes to merge, at each iteration, the pair of clusters \(S_{i}\), \(S_{j}\) that minimizes the distance \(\Delta (S_{i}, S_{j} )\), defined as
where \(\vert S_{i} \vert \) is the number of observations in cluster \(S_{i}\), \(\boldsymbol{\mu}_{i}\) is the mean of points in \(S_{i}\), \(\boldsymbol{\mu}_{j}\) is the mean of points in \(S_{j}\), and \(\boldsymbol{\mu}_{i+j}\) is the mean of points in \(S_{i} \cup S_{j}\). The number of clusters k is an input of the clustering algorithm. Using the elbow method (see Appendix A.1) we set \(k=12\). We opted for Ward’s Agglomerative Clustering Algorithm over alternatives such as kmeans and kmodes due to its propensity for generating more equal cluster sizes [42, 43]. Minimizing the total withincluster variance, which often results in clusters that are similarly sized in terms of variance, Ward’s method provides a more regular partitioning of the data. Since our data is sparse (i.e., each cryptocurrency only has a handful of tags), other alternatives would put most of the cryptocurrencies in a single cluster. However, we show in Appendix A.1 that our conclusions are robust with respect to the clustering algorithm choice.
Clustering evaluation and benchmarks
We investigate whether the clusters obtained via the previous procedure reflect the underlying network structure by studying the indensity and outdensity of links according to the partitioning defined by the clusters. Given the \(N \times N\) adjacency matrix A of our coinvestment network and the clustering \(S^{*}= \{S_{1},\ldots,S_{k} \}\), we define the indensity of a cluster \(S_{i}\) as
and its outdensity as
These metrics are used to study whether cryptocurrencies with similar characteristics—clustered according to the Coinmarket cap tags—are more strongly interconnected (higher incluster density) in the coinvestment network among themselves rather than with groups of dissimilar cryptocurrencies. We, then, compare the indensities and outdensities of the clusters identified by the clustering algorithm with those of random clusters. To generate the random clusters, we simply assign each cryptocurrency to one of the twelve possible clusters with equal probability. In Sect. A.3 of the Appendix, we repeat the analysis with several different node similarity metrics including the Jaccard index, the cosine similarity (also known as Salton index), the Adamic–Adar index, and the resource allocation index, showing that our findings are robust with respect to different metrics.
Time series processing
The investigation of the coinvestment network’s relationship with the cryptocurrency market is conducted by computing cryptocurrencies’ returns correlation. The primary objects of this analysis are cryptocurrencies’ weekly closing price (i.e., the final price at which the cryptocurrency is traded during a specific trading week) time series \(p_{i} (t ), i=1,\ldots, N\). We compute their log returns as
and use the leaveoneout rescaling described in [44] to define the rescaled returns,
where the average of the returns \(\mathbb{E}_{t'} [r_{i}(t') ]\) is computed over all times \(t'\), but the variance \(\mathbb{V}_{t'\neq t} [r_{i}(t') ]\) is computed from the time series where the observation corresponding to \(t'=t\) has been removed. The correlation matrix of the time series \(\tilde{r}_{i}\) is defined as
Cryptocurrencies’ prices usually move coherently, increasing or decreasing simultaneously [45–47]. This collective behaviour of the market makes returns strongly correlated and hides the more subtle effects we want to highlight. Therefore, we adopt the following strategy to remove the socalled market component from the correlation matrix characterising common price comovements [48]. We first compute the set of eigenvalues \(\lambda _{1}, \ldots, \lambda _{N}\) of the correlation matrix, the corresponding eigenvectors \(\mathbf{v}_{1}, \ldots, \mathbf{v}_{N}\), and the modes \(m_{i} (t )\), defined as
We call market mode the mode \(m_{1} (t )\) associated with the largest eigenvalue \(\lambda _{1}\). The time series \(\tilde{r}_{i} (t )\) can now be written as linear combinations of the modes \(m_{i} (t )\),
We can now define the adjusted time series \(r'_{i} (t )\),
and the corresponding adjusted correlation matrix \(C'\),
Network correlation and random benchmarks
We compute the average value of the raw and adjusted correlations C and \(C'\) (defined in Eq. (6), (10) respectively) restricted to the pairs of cryptocurrencies \((i, j )\) that are linked (i.e., share an investor) in the coinvestment network. For any (binary) adjacency matrix M characterising the coinvestment network, we define
and
where the average runs over all pairs \((i,j )\) of connected nodes. The values of \(C_{\mathbf{M}}\) and \(C'_{\mathbf{M}}\) range from −1 to 1, where −1 indicates a perfect inverse correlation, 0 indicates no correlation, and 1 indicates a perfect positive correlation between pairs of cryptocurrencies. High values (close to 1) suggest that the cryptocurrencies move in tandem, while a value around 0 would indicate a lack of any significant relationship in their returns.
We compute \(C_{\mathbf{A}}\) and \(C'_{\mathbf{A}}\) over the adjacency matrix A of the real coinvestment network and compare them with the values obtained on three random network models: the ErdősRényi model [49], the Stochastic Block Model [50], and the Configuration Model [51]. Here—to mimic the properties of the real coinvestment network—we have constructed undirected and unweighted random networks as benchmarks.
For every model, we sample \(n=1000\) network instances \(R_{1}, \ldots, R_{n}\) at random, and compute the mean and standard deviation of the sets \(\{C_{\mathbf{R}_{1}}, \ldots, C_{\mathbf{R}_{n}} \}\) and \(\{C'_{\mathbf{R}_{1}}, \ldots, C'_{\mathbf{R}_{n}} \}\). All models are parametrized to match the empirical properties of the coinvestment network. The probability of a link p in the ErdősRényi model is set to match the coinvestment network’s empirical density,
Blocks in the Stochastic block model match the clusters found with the clustering algorithm and the densities within and across clusters are equal to the empirical values. Finally, the degree sequence in the configuration model matches the empirical degree sequence.
4 Results
4.1 Structure of the cryptocurrency coinvestment network
In this section, we analyze the relationship between institutional investments and the properties of the cryptocurrency market.
We start by quantifying the joint evolution of the number and volume of investments together with the growth of the cryptocurrency market. In Fig. 2, we show the evolution of the total raised amount, number of investments, and market capitalization^{Footnote 3} of the cryptocurrency ecosystem. Overall, we find that the number of investments, as well as the amount raised, has been steadily growing since 2012. Moreover, we find a positive correlation between the cryptocurrency market capitalization (MC) and both the total volume of investments/raised amount in dollars (VI) and the number of investments (NI). The Spearman correlation amounts respectively to \(\rho _{\mathrm{MC}\mathrm{VI}}=0.79\) and \(\rho _{\mathrm{MC}\mathrm{NI}}=0.81\), suggesting that the crypto market and the volume of investments have evolved hand in hand.
Next, we turn to studying the evolution of the coinvestment network in time (see Fig. 3). We find that, since 2014, the network has grown steadily in terms of the cumulative number of nodes (panel A), i.e., cryptocurrency projects funded by institutional investors, and the cumulative number of edges (panel B), i.e., common investors between cryptocurrencies. Interestingly, the growth displays a steeper increase around 2017–2019, consistently with the rapid increase in demand for cryptocurrencies and the rise of Bitcoin’s valuation over those years [52]. Turning our attention to the number of connections per node, we observe that the degree distribution of the coinvestment network is heavytailed, with most nodes having a single connection and only a few having hundreds of neighbours (see Fig. 1C). Interestingly, the shape of the distribution has been relatively stable over time (see Fig. 1C), in line with the findings discussed in Ref. [37], where the authors studied the coinvestment network restricted to ERC20 tokens only.
Which factors may explain the observed structure of the cryptocurrency coinvestment network? In the following, we test the hypothesis that the structure of the coinvestment network is partly determined by the properties characterising different cryptocurrency projects (e.g., their underlying technology or their purpose) because investors tend to specialize and invest in specific types of cryptocurrencies. More formally, we assess whether two cryptocurrencies with similar properties are also more likely to be connected in the coinvestment network compared to any random pair of currencies.
To this end, we assign each cryptocurrency to a cluster, based on its properties (see Sect. 3.2 for more details). Then—for each cluster i—we calculate the incluster density \(\rho ^{i}_{i}\) and the outcluster density \(\rho ^{o}_{i}\), as defined in Eq. (2) and Eq. (3) respectively. We then compare the in and outcluster densities: if \(\rho ^{i}_{i}\) is significantly higher than \(\rho ^{o}_{i}\), then there is a higher density of links among cryptocurrencies with similar properties.
Indeed, we observe that the densities inside clusters of similar cryptocurrencies tend to be larger than those across clusters (see Fig. 4), which confirms our hypothesis. In practice, this implies that similar cryptocurrency projects (i.e., those that share a common set of tags), tend to share a larger number of investors compared to any two randomly chosen projects.
Importantly, we find that–when cryptocurrencies are assigned to random clusters–the relation between the in and outdensity is significantly different (see red shaded area in Fig. 4). Thus, our results reveal that there is a nontrivial connection between the topology of the network and the intrinsic features of cryptocurrency projects. In particular, they hint at the presence of specialised investors who do not simply invest in the whole cryptocurrency ecosystem but rather focus on specific technologies and/or use cases.
4.2 Interplay between the coinvestment network structure and returns correlations
In this section, we investigate the interplay between the structure of the coinvestment network and the cryptocurrency market properties. More specifically, we test if the price returns of cryptocurrencies that share common investors are more correlated than one would expect by random chance.
To this end, we compute the average returns correlation \(C_{\mathbf{A}}\) defined in Eq. (11) across pairs of cryptocurrencies sharing a link in the real coinvestment network (described by its adjacency matrix A). We also compute average returns correlation of cryptocurrency pairs sharing a link on random network benchmarks including (i) an ErdősRényi network, (ii) a configuration model and (iii) a stochastic block model parametrized to reproduce some of the features of the real network (e.g., number of nodes, number of clusters, degree distribution—as detailed in Sect. 3).
Figure 5 compares the values of the correlation for the real coinvestment network and the benchmarks respectively. The correlation values displayed can be found in Table 1 and Table 2 of the Appendix. In Panel A of Fig. 5, the returns correlation between cryptocurrency pairs is plotted against their network distance, defined as the shortest path between the two nodes in the network. Our findings indicate that the average correlation decreases as the distance in the network increases. Cryptocurrencies that are “close” in the coinvestment network are, on average, more correlated than the random benchmarks; conversely, pairs of cryptocurrencies that are distant in the network are less correlated than the benchmarks.
Figure 5, Panel B summarizes the average returns correlation for the real network (blue) and random networks (green, red, and orange). The lighter shades of colour display the values of the correlation \(C'_{\tilde{\mathbf{{A}}}}\) for the adjusted time series, where the market component has been removed (see Sect. 3.2). Once again, the figure shows that the average correlation on the real network is significantly larger than on all the benchmarks tested, suggesting that the network’s structure may directly impact the cryptocurrencies’ market behaviour. Furthermore, the gap between real and random correlation widens significantly after removing the time series as discussed in Sect. 3.2.
Overall, our results reveal that the returns of cryptocurrencies that share a common investor have a stronger correlation than one would expect by random chance, revealing that assets with shared investors tend to be characterized by similar market dynamics.
5 Discussion
In this paper, we have analyzed an ecosystem of 1324 cryptocurrency projects that received 4395 investments from 1767 investors for a total amount of $13B appearing on Crunchbase. We have built and analysed the coinvestment network, where two cryptocurrencies are linked if they share an investor. We have also clustered cryptocurrency projects based on metadata and tags from the Coinmarketcap website and studied the community structure.
As hinted by previous research and surveys concerning institutional and individual crypto investors’ preferences [3, 4, 37, 53], our results show that investors tend to specialise and focus on particular technologies, use cases, and features of the cryptocurrency projects they decide to include in their portfolio.
We have also analyzed the relationship between the coinvestment network and the cryptocurrencies’ market properties. We showed that the presence of a link in the coinvestment network translates into a higher correlation in cryptocurrencies’ returns. The marginal increase in the correlation of cryptocurrency returns decreases as the distance between the considered pairs of cryptocurrencies in the coinvestment network increases.
Our work has limitations that, hopefully, can be turned into future avenues of research. As stated above, we also provide access to the coinvestment network reconstructed from Crunchbase to ease further explorations and extensions of our work. Firstly, our data collection process stopped over the summer of 2021, before the second major cryptocurrency crash and the default of established players such as Terra, Celsius, and FTX. It is legit to wonder to what extent our results would hold in the new regime, where the general sentiment towards cryptocurrencies has pivoted.
Secondly, some prominent players in the cryptocurrencies’ ecosystem are not associated with a company, but rather with different types of organizations including Decentralized Autonomous Organizations (DAOs), foundations, or even no legal entity at all. The nature of the investment may also vary substantially. For instance, instead of buying a share of the company, investors may, e.g., lend money to DeFi protocols in exchange for tokens as rewards (a practice known as liquidity mining [54]). These new organization types and forms of investment are scarcely represented in our dataset, therefore we can only offer a partial view of the cryptocurrencies’ investment ecosystem. Finally, most of our analysis was performed on a static network. However, how the network grows, what the different investment strategies adopted by an investor are, and how they depend on the market are also clearly worth analyzing.
In light of the recent crypto market crash events—from the stablecoin pair Terra—Luna to large exchanges [55–57]—understanding the crypto market connectedness at the investors level helps shed light on possible contagion channels posing threat to the ecosystem overall stability.
Notes
The term “decentralised finance” refers to financial services, such as lending or asset trading, provided through decentralized platforms, as opposed to traditional centralized financial institutions.
An ERC20 token is a type of digital asset that runs on the Ethereum blockchain, following a standardized set of rules so it can easily interact with other apps and tokens. Essentially, it is a special type of currency that can be used in a variety of online applications and services.
The market capitalization of a token is the total value of all its units in circulation, calculated by multiplying the current price per token by the total number of tokens available.
References
Nakamoto S (2008) Bitcoin: a peertopeer electronic cash system. https://bitcoin.org/bitcoin.pdf
Kochkodin B Venture Capital Makes a Record $17 Billion Bet on Crypto World. Accessed: 20221120 (2022). https://www.bloomberg.com/news/articles/20210618/venturecapitalmakesarecord17billionbetoncryptoworld?sref=3REHEaVI
Neureuter J (2021) The institutional investor digital assets study. Technical report, Fidelity Digital Assets. https://www.fidelitydigitalassets.com/sites/default/files/documents/2021digitalassetstudy.pdf
Institutionalisation of cryptoassets and DeFi–TradFi interconnectedness. Accessed: 20221120 (2022). https://www.oecdilibrary.org/content/paper/5d9dddbeen
Rauchs M, Blandin A, Bear K, McKeon SB (2019) 2nd global enterprise blockchain benchmarking study. Available at SSRN 3461765
Gurguc Z, Knottenbelt W Cryptocurrencies: overcoming barriers to trust and adoption. Retrieved from Imperial College London website: https://www.imperial.ac.uk/media/imperialcollege/researchcentresandgroups/ic3re/cryptocurrenciesovercomingbarrierstotrustandadoption.pdf (2018)
Auer R, Farag M, Lewrick U, Orazem L, Zoss M (2023) Banking in the shadow of bitcoin? The institutional adoption of cryptocurrencies
Shakhnov K, Zaccaria L (2020) (r) evolution in entrepreneurial finance? the relationship between cryptocurrency and venture capital markets. Technical report, Einaudi Institute for Economics and Finance (EIEF)
Sigaki HY, Perc M, Ribeiro HV (2019) Clustering patterns in efficiency and the comingofage of the cryptocurrency market. Sci Rep 9(1):1–9
VidalTomás D, Ibañez A (2018) Semistrong efficiency of bitcoin. Finance Res Lett 27:259–265
Chen CYH, Hafner CM (2019) Sentimentinduced bubbles in the cryptocurrency market. J Financ Risk Manag 12(2):53
VidalTomás D, Bartolucci S (2023) Artificial intelligence and digital economy: divergent realities. Available at SSRN 4589333
Dowd K, Greenaway D (1993) Currency competition, network externalities and switching costs: towards an alternative view of optimum currency areas. Econ J 103(420):1180–1189
Luther WJ (2016) Cryptocurrencies, network effects, and switching costs. Contemp Econ Policy 34(3):553–571
ElBahrawy A, Alessandretti L, Baronchelli A (2019) Wikipedia and cryptocurrencies: interplay between collective attention and market performance. Front Blockchain 2:12
Alessandretti L, ElBahrawy A, Aiello LM, Baronchelli A (2018) Machine learning the cryptocurrency market. Complexity 2018
Walther T, Klein T, Bouri E (2019) Exogenous drivers of bitcoin and cryptocurrency volatility–a mixed data sampling approach to forecasting. University of St. Gallen. Research Paper (2018/19)
McNally S, Roche J, Caton S (2018) Predicting the price of bitcoin using machine learning. In: 2018 26th euromicro international conference on parallel, distributed and networkbased processing (PDP). IEEE Press, New York, pp 339–343. https://ieeexplore.ieee.org/abstract/document/8374483
Chen Z, Li C, Sun W (2020) Bitcoin price prediction using machine learning: an approach to sample dimension engineering. J Comput Appl Math 365:112395
Akyildirim E, Goncu A, Sensoy A (2020) Prediction of cryptocurrency returns using machine learning. Ann Oper Res: 1–34
Garcia D, Tessone CJ, Mavrodiev P, Perony N (2014) The digital traces of bubbles: feedback cycles between socioeconomic signals in the bitcoin economy. J R Soc Interface 11(99):20140623. https://doi.org/10.1098/rsif.2014.0623
Aste T (2018) Cryptocurrency market structure: connecting emotions and economics. special issue of digital finance on cryptocurrencies. Digit Finance
Ortu M, Uras N, Conversano C, Bartolucci S, Destefanis G (2022) On technical trading and social media indicators for cryptocurrency price classification through deep learning. Expert Syst Appl 198:116804
Lucchini L, Alessandretti L, Lepri B, Gallo A, Baronchelli A (2020) From code to market: network of developers and correlated returns of cryptocurrencies. Sci Adv 6(51). https://doi.org/10.1126/sciadv.abd2204
Lyócsa Š, Molnár P, Plíhal T, Širaňová M (2020) Impact of macroeconomic news, regulation and hacking exchange markets on the volatility of bitcoin. J Econ Dyn Control 119:103980
Corbet S, Larkin C, Lucey BM, Meegan A, Yarovaya L (2020) The impact of macroeconomic news on bitcoin returns. Eur J Finance 26(14):1396–1416
Borri N, Shakhnov K (2020) Regulation spillovers across cryptocurrency markets. Finance Res Lett 36:101333
Nguyen KQ (2022) The correlation between the stock market and bitcoin during Covid19 and other uncertainty periods. Finance Res Lett 46:102284
Koutsouri A, Poli F, Alfieri E, Petch M, Distaso W, Knottenbelt WJ (2020) Balancing cryptoassets and gold: a weightedriskcontribution index for the alternative asset space. In: Mathematical research for blockchain economy. Springer, Berlin, pp 217–232
Platanakis E, Urquhart A (2020) Should investors include bitcoin in their portfolios? A portfolio theory approach. Br Account Rev 52(4):100837
Hu Y, Rachev ST, Fabozzi FJ (2019) Modelling crypto asset price dynamics, optimal crypto portfolio, and crypto option valuation. ArXiv preprint. arXiv:1908.05419
Ahelegbey DF, Giudici P, Mojtahedi F (2021) Crypto asset portfolio selection. Available at SSRN 3892999
Białkowski J (2020) Cryptocurrencies in institutional investors’ portfolios: evidence from industry stoploss rules. Econ Lett 191:108834
Sun W, Dedahanov AT, Shin HY, Li WP (2021) Factors affecting institutional investors to add cryptocurrency to asset portfolios. N Am J Econ Finance 58:101499
Ciaian P, Cupak A, Fessler P, Kancs D (2022) Environmentalsocialgovernance preferences and investments in cryptoassets. ArXiv preprint. arXiv:2206.14548
Almeida J, Gonçalves TC (2023) A systematic literature review of investor behavior in the cryptocurrency markets. J Behav Exp Finance 100785
Liu SH, Liu XF (2021) Coinvestment network of ERC20 tokens: network structure versus market performance. Front Phys 9:55
Coinmarketcap. https://coinmarketcap.com/. Accessed: 20220716
Dalle JM, den Besten M, Menon C (2017) Using crunchbase for economic and managerial research. Technical report, OECD. https://doi.org/10.1787/6c418d60en. https://www.oecdilibrary.org/content/paper/6c418d60en
den Besten ML (2020) Crunchbase research: monitoring entrepreneurship research in the age of big data. Available at SSRN 3724395
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, New York
Murtagh F, Legendre P (2014) Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? J Classif 31:274–295
Bouchaud JP, Potters M (2003) Theory of financial risk and derivative pricing. Cambridge University Press, Cambridge. https://doi.org/10.1017/cbo9780511753893
Katsiampa P, Corbet S, Lucey B (2019) High frequency volatility comovements in cryptocurrency markets. J Int Financ Mark Inst Money 62:35–52
Koutmos D (2018) Return and volatility spillovers among cryptocurrencies. Econ Lett 173:122–127
Stosic D, Stosic D, Ludermir TB, Stosic T (2018) Collective behavior of cryptocurrency price changes. Phys A, Stat Mech Appl 507:499–509
Laloux L, Cizeau P, Bouchaud JP, Potters M (1999) Noise dressing of financial correlation matrices. Phys Rev Lett 83(7):1467
Erdős P, Rényi A (1959) On random graphs I. Publ Math (Debr) 6:290–297
Karrer B, Newman MEJ (2011) Stochastic blockmodels and community structure in networks. Phys Rev E 83:016107. https://doi.org/10.1103/PhysRevE.83.016107
Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256. https://doi.org/10.1137/S003614450342480
Institutional investors will bet big on cryptocurrencies in 2018 (2018) Technical report, Cointelegraph. https://cointelegraph.com/news/institutionalinvestorswillbetbigoncryptocurrenciesin2018
Ciaian P, Rajcaniova M, Kancs D (2016) The economics of bitcoin price formation. Appl Econ 48(19):1799–1815
Fan S, Min T, Wu X, Wei C (2022) Towards understanding governance tokens in liquidity mining: a case study of decentralized exchanges. World Wide Web: 1–20
Briola A, VidalTomás D, Wang Y, Aste T (2022) Anatomy of a stablecoin’s failure: the terraluna case. Finance Res Lett 103358
Hermans L, Ianiro A, Kochanska U, Törmälehto VM, van der Kraaij A, Simón JMV et al (2022) Decrypting financial stability risks in cryptoasset markets. Financ Stab Rev 1
Chipolina S (2022) FT Cryptofinance: crypto’s Lehman moment. Accessed: 20221130. https://www.ft.com/content/a4d31278a5d94a02a1694996f4a8e8f8
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137. https://doi.org/10.1109/TIT.1982.1056489
Huang Z (1998) Extensions to the kmeans algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304. https://doi.org/10.1023/A:1009769707641
Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A, Stat Mech Appl 390(6):1150–1170. https://doi.org/10.1016/j.physa.2010.11.027
Acknowledgements
The authors acknowledge Crunchbase for easing data access.
Author information
Authors and Affiliations
Contributions
LM collected the data and performed the analysis. All authors analysed the results and wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A.1 Methods
We have tested four algorithms to cluster our set of cryptocurrencies based on the associated Coinmarketcap tags, namely Ward’s iterative clustering, kmeans [58], kmodes [59], and an agglomerative clustering algorithm based on cosine distance between data points. We eventually settled for Ward’s algorithm due to its propensity to generate more equallysized clusters [42, 43]. However, other algorithms resulted in similar, nonrandom partitions of cryptocurrencies into clusters as shown in Fig. 6.
However, the algorithm choice might be not optimal, and more sophisticated clustering algorithms could lead to more insightful partitions of our data. Specifically, it should be mentioned that Ward’s algorithm, as well as kmeans, computes Euclidean distances to divide data points into clusters, which is, arguably, not the optimal way of computing distances when dealing with binary data.
To select the total number of cryptocurrencies’ clusters we employ the elbow method. For each possible partition \(\mathbf{S} = {S_{1}, \ldots, S_{k}}\) of the dataset, we define a loss function \(L (S )\) as
where \(\mathbf{x}_{j}\) is the vector of tags observations for cryptocurrencies belonging to the partition \(S_{i}\) and \(\boldsymbol{\mu}_{i}\) is its mean. We ran the clustering algorithm for several different values of k, and computed the value of the loss function for the set of optimal partitions \(\{S^{*}_{k=1}, S^{*}_{k=2}, \ldots, S^{*}_{k=N} \}\), where N is the total number of cryptocurrencies considered in our study.
The elbow method prescribes choosing the maximum number of clusters before the curve becomes flat. Intuitively, the method recommends picking a point where the marginal decrease in the loss function is not worth the additional cost of creating another cluster. Figure 7 shows that a value around \(k=12\) is compatible with the elbow method in our case.
1.2 A.2 Further analysis
The tables below report the results used to build Fig. 5. In particular, we show the mean correlation defined in Eq. (12) and its variance computed over 1000 realizations of the random networks and on the real coinvestment network (Eq. (11)). In Table 1 we report the results as a function of the network distance, while in Table 2 computed over all pairs of cryptocurrencies, including the raw correlation values as well as correlations computed on ‘cleaned data’ obtained by removing the market mode (see Eq. (10) and rescaling the correlation to be in the range \([0,1]\) and included in the figure.
1.3 A.3 Clusters analysis
To better characterise the similarity between nodes belonging to the same clusters as defined in Sect. A.1, we compute four wellknown similarity measures [60], the Jaccard index, the cosine similarity (also known as Salton index), the Adamic–Adar index, and the resource allocation index. The Jaccard index measures the similarity between two nodes’ sets of neighbours and is defined as the size of the intersection divided by the size of the union of the sets. The cosine similarity counts the number of common neighbours but penalizes nodes that have a higher degree. The Adamic–Adar index and the resource allocation index count the number of common neighbours, but they assign a lower weight to neighbours that have a high degree. If we call \(\Gamma (i )\) the set of neighbors of a node i, we can define these measures as
For each cluster \(S_{k}\), we compute the average value of each metric within and outside the cluster. The average similarity inside the cluster is
and the average similarity outside the cluster is
where \(d_{ij}\) represents one of the four metrics defined above. Figure 8 shows the values of the in and outaverage similarity metrics for the 12 cryptocurrency clusters described in Sect. 4 and compares them with those obtained for 1000 random clustering assignments. Nodes belonging to the same cluster tend to be more similar, in a way that is not compatible with a random benchmark.
1.4 A.4 Crunchbase dataset
Crunchbase provides information on worldwide innovative companies. The dataset covers several aspects of the companies, spanning from a basic description of the business description to their financial status, board composition, and even media exposition. The dataset is organized in different bundles that reflect this different information. The bundles are:

Companyrelated: organizations (including information on parent companies, organization descriptions, and their division in categories) and investment funds.

Investmentrelated: funding rounds (group of investments in a single company), investments (specific investortocompany transaction), investors, acquisitions, IPOs.

Peoplerelated: people covered in the dataset, the jobs they have, and the degrees they hold, with a focus on investment partners.

Eventrelated: events description and event appearances of specific companies.
For the sake of this paper, the relevant bundles concern organization, funding rounds, and investments. We detail their content in Tables 3, 4, 5.
1.5 A.5 Coinmarketcap cryptocurrency tags
Table 6 contains together with their respective frequency gathered from Coinmarketcap for all the cryptocurrency projects analysed in this paper. Given the heterogeneity of the cryptocurrency market in terms of use case and/or supporting technology, the tags created by Coinmarketcap help label and distinguish the different types of cryptocurrencies based on ‘intrinsic’ features related to the nature of the project.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mungo, L., Bartolucci, S. & Alessandretti, L. Cryptocurrency coinvestment network: token returns reflect investment patterns. EPJ Data Sci. 13, 11 (2024). https://doi.org/10.1140/epjds/s1368802300446x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1140/epjds/s1368802300446x