Analysis of the Bitcoin blockchain: Socio-economic factors behind the adoption

As the first decentralized digital currency introduced in 2009 together with the blockchain, Bitcoin offers new opportunities both for developed and developing countries. Bitcoin peer-to-peer transactions are independent of the banking system, thus facilitating foreign exchanges with low transaction fees such as remittances, with a high degree of anonymity. These opportunities together with other key factors led the Bitcoin to become extremely popular and made its price skyrocket during 2017. However, while the Bitcoin blockchain attracts a lot of attention, it remains difficult to investigate where this attention comes from, due to the pseudo-anonymity of the system, and consequently to appreciate its social impact. Here we make an attempt to characterize the adoption of the bitcoin blockchain by country. In the first part of the work we show that information about the number of Bitcoin software client downloads, the IP addresses that act as relays for the transactions, and the Internet searches about Bitcoin provide together a coherent picture of the system evolution in different countries. Using these quantities as a proxy for user adoption, we identified several socio-economic indexes such as the GDP per capita, freedom of trade and the Internet penetration as key variables correlated with the degree of user adoption. In the second part of the work, we build a network of Bitcoin transactions between countries using the IP addresses of nodes relaying transactions and we develop an augmented version of the gravity model of trade in order to identify socio-economic factors linked to the flow of bitcoins between countries. In a nutshell our study provides a new insight on the bitcoin adoption by country and on the potential socio-economic drivers of the international bitcoin flow.


Introduction
Bitcoin is a digital currency created in 2009 as an alternative to the banking system. Not only it offers a payment mechanism without any centralized control (i.e., by institutions, governments, or banks), but it has also introduced the revolutionary concept of the blockchain. After a continuous growth during the last years, Bitcoin becomes now a solid reality and a fascinating object of study. The possible future applications of the blockchain and of cryptocurrencies in general appear as very promising, even if this technology is relatively new and at the first stage of its evolution. Studying the Bitcoin system as the most significant implementation of a blockchain cryptocurrency is an important challenge to understand how this decentralized model is behaving in the real-world. In fact, recent literature abounds with several lines of research linked to the Bitcoin blockchain. A large part of the effort is devoted to the study of the blockchain technology itself, in particular to its development [1,2,3] and to its application to other domains [4]. Another undeniably important line of research concerns the financial and economic aspect, where one of the main questions is related to the evolution of prices [5,6,7,8], and issues concerning regulatory institutions and policy [1,9]. From a social point of view, the study of the uptake of the Bitcoin proves to be a challenging task due to the pseudo-anonymity of the system. Digital cryptocurrencies such as Bitcoin can have a significant social impact, as they allow for fast transactions at low costs, offering a solution for tips, donations, and micropayments without the need of a banking system, paving the way for their wide adoption. However, as users can 1 generate as many pseudonyms as they want, this impact is difficult to quantify. In the direction of investigating the social impact of Bitcoin, previous studies have used either external data such as the number of Bitcoin client software downloads by country, the amount of each fiat currency involved in bitcoin transactions on exchange [10], and bitcoin transaction data [11,12]. To exploit the transactions bitcoin data, a crucial step is the process of deanonymization that consists in grouping pseudonyms belonging to the same users, this technique serves both as a way to evaluate the level of privacy of the bitcoin system [13] and to characterize the type of usage [12,14,15].
Here we propose to combine both Bitcoin transaction data and external data sources to quantify the Bitcoin adoption by country; underlining the main factors that might represent a motivation or a deterrent for the Bitcoin adoption, and we explore how this might have evolved over time given the data we have. Moreover, with the introduction of specific metrics, we build and model an international Bitcoin flow network, and from this model we extract the socio-economic indexes playing a main role in the dynamic of transactions.
We organize the rest of this paper as follows: Section 2 provides an overview of the datasets that we used and a description of the pre-processing stage. We analyze three different external data sources to investigate how relevant they are as proxies to evaluate the Bitcoin user adoption. In Section 3 we characterize the Bitcoin adoption per country, underlying the relevance of various socio-economic factors and analyzing the adoption trends. In Section 4 we use deanonymization heuristics on the Bitcoin transaction ledger to build a transaction network of users to which we assign countries based on the Internet addresses (IPs) of the nodes that relay their transactions. We finally model the international Bitcoin flow network using an augmented version of the gravity model of trade, and we explore the socio-economic indexes that are correlated to these flows. Section 5 summarizes and discusses our results.

Data collection and pre-processing
As we intend to investigate the Bitcoin adoption per country, beside the Bitcoin transactional data that can be directly extracted from the Bitcoin blockchain using a block explorer service, we gathered three additional sources of information. From the Bitcoin system we extracted: the IP address of the first node that has relayed each transaction (available through the API at blockchain.info [16]), and the number of downloads for Bitcoin Core, one of the major Bitcoin clients. Finally, we used information from Google Trends to quantify the collective attention towards Bitcoin. Some details about these datasets are reported in Table 2.

Bitcoin blockchain
The full Bitcoin blockchain database is freely accessible from the Internet; we collected the list of bitcoin transactions using the API from blockchain.info [16] over a period extended from 2009-01-09 to 2016-02-25 1 .
In order to send and receive bitcoin, users need to create Bitcoin addresses. For each transaction we gathered the input and output Bitcoin addresses of the users involved as well as the amounts transferred, the fees, the block height and the position relative to the block. Some general information about the Bitcoin blockchain dataset we collected is reported in Table 2. We have used as timestamp for each transaction the Unix timestamp of the creation of the block in which it is contained. In fact, the blockchain does not provide any time information for the transactions, but it contains the timestamp of block creation [18]. Considering that several blocks are mined each hour, the block timestamp is a good proxy for our study.
Regarding the transaction amounts, we converted them from BTC (Bitcoin currency) to USD, using a daily exchange rate, as the Bitcoin price has drastically changed over the years (see Appendix A.1).

Internet Protocol addresses
To get an insight about users and their geolocation we consider the IP of the nodes which relay the transactions in the Bitcoin network. Bitcoin indeed uses a gossip protocol in which users communicate their new transactions to all their connected peers across the network and some studies have shown that connecting to a substantial part of the network the first node/IP that communicates a transaction is likely to be its creator [19,20,21]. We thus downloaded the IP addresses of the first nodes that act as relayers in each transaction from blockchain.info, with Assigns a score to countries based on relative in-country queries. The data are normalized between 0 and 100.
Google *timestamp of the block creation **effective coverage period shorter  Table 2: General statistics about the blockchain dataset collected the time resolution of the block creation (≈ 10 minutes). As our goal is to perform a socio-economic analysis at the country level, we mapped the IPs into their corresponding countries. This process is described in A.3. Moreover, we are aware that some users use TOR in order to increase their anonymity in the network. TOR is an Internet protocol which reroutes connections through a virtual circuit so that the IP address is hidden for the rest of the network. During the geo-localization process we thus filtered those transactions relayed by TOR exit nodes (see Appendix A.2), which represent less than 0.001% over the total number of transactions.
One quantity of interest for studying Bitcoin adoption, is the number of such relay node IP adresses that appeared at least one time in the Bitcoin system. Indeed this gives us an idea of the popularity of the Bitcoin in the different countries as shown in Figure 1 where we reported the number of such IP (each new IP being counted only one time) by countries over our period of study for a selection of countries with enough activity in the Bitcoin system. In Section 3 we explain how we selected these countries.
Looking at the evolution of the number of new IP appearing in the system (as IP of relay nodes) with time in Figure 4, we observe a drop in the recorded activity of IP, so we restricted the analysis on the time interval from March 2012 to May 2014.

Bitcoin Client
To better assess the Bitcoin uptake we also consider the number of Bitcoin Client downloads. Generally speaking, a Bitcoin client is a software used to manage and store Bitcoin addresses and make transactions on the Bitcoin network. The official Bitcoin client is called Bitcoin Core, and it is available from sourceforge.net [22]. SourceForge provides some statistics about the downloads, including the total number of downloads, daily aggregated by country, as shown in Figure 2. As other clients exist and some users perform transactions through web-based services, the data from

Google Trends
Here we use Google Trends as a proxy for the collective attention on the subject, as already proposed in [23]. Figure 3 provides for each country the evolution of the number of queries relative to the total number of queries done, with a week resolution, for a specific keyword that here we simply set as "Bitcoin". Besides, we extracted the Google's interest by region, using the country's relative number of queries, the scale goes from 0 to 100, 100 being assigned to the country with the highest number of searches on Bitcoin.

Country Socio-Economic indexes
In order to characterize the adopters of the bitcoin we gathered datasets about socio-economic indexes at the country level with the aim of exploring the relationship between these indices and the Bitcoin adoption. We mainly focused on indexes that distinguish the most developed, richest and wealthy countries from the less developed ones. Table 2 summarizes the indexes that we used.

Bitcoin adoption at the country level
With the goal of appreciating the adoption at the country level, we have identified Bitcoin client downloads, IP of relay nodes and Google Trends as relevant sources of information. Here, we show that these quantities provide a similar and consistent picture of user and thus, we choose to use them as proxies to study the adoption process. This preliminary step paves the way for two types of analysis. First, we show how countries with different developing indexes have different trends of adoption and lastly, we explore how country socio-economic indexes are linked to the bitcoin adoption.

A coherent picture about the users
The numbers of relay node IP and client downloads are measurements directly related to the blockchain, so that both of them give a direct information of the Bitcoin usage even if none of them can provide a complete picture of the users. In particular, the number of IP addresses does not consider users that do not run a node, and thus do not appear as an IP in the network. On the other side, the number of client downloads provides only information about users using this specific client. Because of these limitations, we cannot identify the exact number of users per country but a trend of evolution. To compare the information given by the numbers of relay node IP and client Socio-Economic index
downloads, we first select countries whose activity level permits the analysis. In order to make the selection based on the activity, we computed the medians of the number of client downloads and of the number of different relay nodes IP among all countries on moving time windows and we repeatedly filter countries using as thresholds the median of client downloads and the median of the number of unique IP's among all countries. The moving windows are one year wide with a step of one month, they cover the period from 2012-03-01 to 2014-05-01. At the end of the filtering process, we select a group of 72 countries, listed in Table 8.
On this selection, we explore the relationship between the time series of the numbers of different Ip's and client downloads. We compute the Pearson's correlation coefficient between the time-series of the number of unique IP appearing in the bitcoin system and the number of bitcoin client downloads both at world wide and country level (time series have been cleaned of the small flucturation by applying a moving window average one month long, with an offset of one day) the results reported in Table 3 indicate high correlations that confirm that the number of unique IP's and of client downloads give together a coherent picture about the trend of adoption of each country. This supports the point of using both quantities to study the country adoption. Additionally, we compute the Spearman's correlation coefficient between the ranking of countries given by IP addresses and client downloads in three different years, arriving to the same conclusion.
We also confronted the Google Trends time series with the numbers of unique IP's and client downloads computing the pairwise Pearson correlations to see if the three data sources give a consistent picture about the users. Given the high correlations as shown in Table 3, we assume that Google time series Trends is a well a good indicator of the country Bitcoin adoption, and we suppose that this assumption holds beyond the timespan of the validity of IP and client downloads which allows us to discuss long term adoption trends of the selected countries. To assess the relevance of the use of Bitcoin search time series for comparing country adoption, we also measured the Spearman correlation between the pairwise rankings of countries by Bitcoin searches, number of Bitcoin client downloaded and new IP appearing. Correlations are high apart for the year 2012 where the signal about Bitcoin searches is too low for allowing comparison between countries. Moreover the country ranking based on Google queries heavily depends on Google's usage by country, which can be very heterogeneous. For this reason we wont use the rank provided by Google to extract the socio-economic indices possibly linked to the user adoption, but we can use the country Google time-series to explore the long term trends.

Adoption trends: developing versus developed countries
Using the data from Google Trends we studied the evolution of the collective attention by country from 2009 to early 2017. As we are interested in the long term trends, we smoothed the Bitcoin search time series by country using a digital low-pass filter to focus on variation on a time scale of 3 years. To study the main trends present in the time series, we built a matrix A ∈ R n×m (where n represents the number of countries and m is the number of points in the time-series), and we approximated it through non-negative matrix factorization into a product of matrices W · H with W ∈ R n×k and H ∈ R k×m . Applying such appoximation, each country Bitcoin search time series can be represented as a linear combination of k components, stored as the rows of matrix H, and with the coefficients stored in W . The number of components has been chosen to be k = 4 using the bi-cross validation method [24]. In Figure 6 we show the approximated trends for -smoothed -time series of 6 countries and the shape of the 4 principal components is shown in Figure 5: we identified three components that fluctuate over time, and one component that has a clear increasing trend starting from the middle of 2015. Looking at the coefficient matrix, W , we separated the countries in 2 groups, those having their highest coefficient for the clear increasing component, we consider them as the new adopters of Bitcoin, and the others whose the main components composing their time series are fluctuating. As shown in Table 3, grouping countries by development indexes we observed that most of the developed countries are among the early adopters of the Bitcoin, i.e. the attention was notable already in the early years of Bitcoin. On the other hand, a big part of the developing countries show a recent high interest in Bitcoin.       -IP data are not available for 2011 Table 6: Spearman's correlations between the ranks of countries obtained using the number of unique IP addresses and each socio-economic index collected, and the ranks obtained using the number of Bitcoin client downloads and each socio economic index collected different socio-economic indexes. In the results, reported in Table 6, we observe a high positive correlation both with the Internet penetration, GDP per capita (PPP), HDI, and a scarce negative correlation with inflation. The general picture that emerges is that socio-economic welfare -as present in most developed countries-has stimulated the Bitcoin adoption, at least for the years 2011, 2012, 2013 and 2014 for which we could carry out this analysis. Beside some expected correlation, like the one regarding the internet penetration that represents an essential condition to participate in the Bitcoin network, the results obtained for the overall freedom and trade freedom are specially interesting. The two indexes provide a measure of the economic freedom, in particular trade freedom measure the presence of barriers that affect imports and exports of goods and services, and the overall economic freedom index, takes a comprehensive view on the country's interactions with the rest of the world and the economic and finance policies within the country. The good correlation measured suggests that, also if Bitcoin was born with the intention of break obstacles in the way people can exchange money, the general picture about the Bitcoin adoption reveals that the presence of policies that promote economical freedom represent a fundamental element in favour of the Bitcoin adoption.
More than considering the drivers behind the country adoption, in this second section, we attempt to identify the key socio-economic indexes related to the international Bitcoin flow. The process that leads to the estimation of the Bitcoin flow network consists first of all in a clustering of Bitcoin addresses into users, through a deanonymization process, and a mapping that assigns users to countries.

Identification of users -clustering of addresses
Bitcoin transactions are based on the utilization of Bitcoin addresses, that are the result of applying a hashing function to some input string. Moreover users can create new Bitcoin addresses without limitation in order to hold, receive and send bitcoins; this is computationally cheap and has no cost for them. This anonymizes the users' activities, as we cannot know a priori which users are involved in a transaction, neither which set of Bitcoin addresses belongs to the same user. However, a partial deanonymization method exists and it permits to reveal the group of Bitcoin addresses likely owned by a single user. This method is based on two heuristics that take inspiration from the underlying functioning of the Bitcoin transaction system [20,25,26,27,28,29]. In particular, we base our work starting from the definitions reported in "Characterizing Payments Among Men with No Names" [26]. Satoshi Nakamoto, the creator of the Bitcoin system, suggests in his original paper the first heuristic that deals with input addresses [30]. It is based on the fact that the sum of the bitcoins hold in the input addresses of a transaction must be entirely spent and sent to the output address. As a consequence, a user that hold more than one Bitcoin address can provide a certain number of input addresses in order to reach the desired amount he wants to spend. Due to this functioning, the same user might hold all the input addresses of a transaction. Calling t a transaction and input(t) the set of all the input addresses, we summarize the first heuristic as: If two (or more) Bitcoin addresses are inputs to the same transaction, they are controlled by the same user.

• For a transaction t all input(t) are controlled by the same user
On the other hand, the second heuristic uses the definition of shadow addresses. As described before, the sum of the bitcoins contained in the input addresses has to be entirely spent. As a consequence, the fraction of the amount that exceeds the value that the sender wants to spend is usually sent to a new Bitcoin address. The latter is called shadow address and is created by the sender to collect back the change. The assumption is then that one of the output addresses can be the shadow address.
Calling A i a Bitcoin address we focus on the set of output addresses {A i } i∈[ [1,n]] of a transaction output(t). We call the number of times the address A i is used as output of a transaction as n o Ai . Focusing on transactions that have at least 2 output addresses, n ≥ 2 and adopt the following procedure to identify the shadow addresses: The shadow address Ai ∈ outputs(t), if it exists, is controlled by the same user that controls the inputs(t). The definition that brings to the identification of the shadow addresses is: The Bitcoin address Ai appears only one time as output of a transaction, and there is no other output addresses Aj that satisfies the same condition.
• ∀i ∈ [[1, n]] Ai / ∈ input(t) There is not an explicit self shadow addresses, in the sense that there are no Bitcoin address that is present both as an input and output of the same transaction.
After applying the two heuristics, we do not have directly cluster of users, but we only have a partial aggregation at the transaction level. For instance let us assume transactions involving the addresses A, B, C, D, E that are result in three groups A, B, C}, {A, D}, and {D, E} after the deanonylization process. Then {A, B, C, D, E} should be seen as the same user's Bitcoin addresses. This process of grouping, that can seem straightforward, turns out to be a challenging process considering that the number of incomplete groups scales with the number of transaction considered. We solve this problem building a network in which Bitcoin addresses represent the nodes and they are linked together if they belong to the same partial group. Then to merge all the incomplete groups, we extracted the connected components of the network. Each connected component represents the complete group of all the user's addresses.
The whole deanonymization process is highly sensitive to mistakes made in the utilization of heuristics. There is the possibility that for some transactions, the principles on which are defined the heuristics are not valid leading to a wrong grouping of Bitcoin addresses. This could lead to collapse Bitcoin addresses of different users into a single entity, with the risk of creating users that seem to control a huge number of Bitcoin addresses. Being aware of this problem, we tried to use the safest heuristics possible, even at the expense of discarding some true linking between Bitcoin addresses. As some false linking could anyway occurs, the timespan we use for the deanonymization, starts to play a key role; bigger is the period of analysis, bigger is the probability that errors can cause the appearance of big clusters of Bitcoin addresses. Reducing the interval of the analysis might lead to the identification of a large number of small groups of addresses, in other terms the same user might still be splitted in several group of addresses.
The result shown in this section are based on a deanonymization process that takes into account all the transactions occurred in the year 2013 (i.e the only year for which we have complete IP information). In order to be confident that the results obtained do not heavily depend on the timespan considered for the deanonymization, we carried out the whole modeling analysis -that is described below -applying deanonymization on different time intervals. In particular, we used the period between block 1 and block 400000 (the last in our database), and the one between block 180000 and 300000 (that corresponds to the period for which we have the IP information). In both the cases the results are similar and lead to identification of the same socio-economic factors that can explain the international Bitcoin flow.
Finally after running the deanonymization, we can build the transaction network between users, identifying in the transactions the shadow addresses.

Country Association
Thanks to the deanonymization procedure we can identify transactions in which a specific user appears as sender or creator. Assuming that the first node/IP that relays a transaction is its creator we can associate to each user the list of IP used to send bitcoins. Using the IP geo-localization here we describe how we associate countries to users. A quick look at the user's IP addresses, reveals that we are far from an ideal situation in which every user operates with a single IP, that furthermore is not used by anyone else. Bitcoin services (i.e., the infrastructures that allow users to transact without being a node of the bitcoin network) partially creates this problem as users are seen as using the IP address that belongs to the service. Moreover, a user who does not use services might also use several IP addresses. To balance the presence of services in the IP addresses usage we build metric, that has the same form of the TFIDF (term frequency-inverse document frequency) metric [31] commonly used to reflect the importance of words inside documents. This metric respects three main principles that we consider as crucial for the discrimination: 1. The metric assigns a score to all possible user's countries, instead of assigning a score to each IP address.
2. The score rewards the IP usages that are close to the ideal situation, in which an IP address is used just by an user that uses only that IP address.
3. Being aware that users can use different IP addresses, this metric takes into consideration the ratio between user IP usage and the overall user activity (measured as number of IP addresses ).
The formula used to geo-localize the users is reported in Appendix B.1, together with an alternative version, based on similar principles, created to test the robustness of the assignment. As the metric uses the IP information, due to the time limitations shown in Figure 4 we carry out this analysis for the restricted timespan from March 2012 to May 2014. The geo-localization process leads to the identification of destination and origin for 79% of the transactions in 2013.
In order to test the robustness of the assignment of countries, we compare the result of the 2 versions of the metric, finding that 98% of users received the same association. One of the misclassified users is a very active user in 2013, the TFIDF based method classify it as from United States and the other metric as German. This results in differences in the international flow, but as United States and Germany are both developed countries with similar socio-economic indexes, this will not change the interpretation of the results in the modeling part.

Flow network
After assigning a country to each user, we created the Bitcoin trade network, in which the nodes represent countries and the weighted links represent the amount of Bitcoins exchanged converted in dollars. From now on, we will focus on transactions achieved in 2013 and work with the restricted group of countries analyzed in the first part of the work. In Figure 7 a visualization of the international Bitcoin flow network is displayed.

Flow modeling
To understand which socio-economic indexes are potentially explanatory of the Bitcoin flow, we build a model using as a starting point the gravity model, introduced by Jan Tinbergen in 1962 [32] used to model the bilateral trade flows of different goods and services between countries. The basic form of the model is similar to Newton's law of gravitation: it uses socio-economic indexes that represent the economic mass of the country a, M a , and which make the interactions stronger, and a variable representing distance between countries, D ab , which decreases the strength of the interactions. Adding a constant G, this model takes the form: where F ab represents the flow between countries a and b and β 1 , β 2 , β 3 are coefficients that take real values. The traditional approach for fitting the model consists in taking logarithms of both sides, leading to a log-log model in which it is possible to perform a linear regression [33] (constant G becomes β 0 ).
Here we use an augmented gravity model [34,35,36], which means we are considering additional variables. Calling {X ab i } i∈ 1,n , the n variables that might be either single country quantities (e.g. the masses M a and M b ) or quantities related to the couple of countries (a, b) (e.g. the distance D ab ), the model can now be written as: Positive β i are associated to variables X ab i that contribute to the mass of countries while negative values instead represent variables that act like distances. However, this approach cannot model the zero observations, and the estimation of the log-linearized equation by least squares (OLS) can lead to significant biases under heteroskedasticity [37]. As an alternative, it is possible to work with its multiplicative form, as shown in Equation 4, replacing the linear regression by a Poisson regression.
The vector β = [β 0 . . . β n ] is estimated maximizing the likelihood : where F is a vector containing the Bitcoin flows between m pairs of countries and X is an m × (n + 1) matrix, where each column is given by a vector x ab whose the values are the variables X ab i i∈ 1,n concatenated to a 1 that is introduced to take into account the constant term β 0 .
Here we use the following group of variables frequently encountered in the literature on trade: population, distance, GDP per capita, and interaction variables that identify countries with a common language or geographic border. Besides, we consider Freedom to Trade, Overall Freedom, and Internet Penetration, as we observed (see Figure 7: Visualization of the international Bitcoin flow for 2013. The size of each ribbon is proportional to the amount of Bitcoin expressed in dollars exchanged between 2 countries (the colour of a ribbon identifies the sender country). On the external circle we show the repartition of the flow in term of sending (external bar) and receiving (external bar) for each country (or group of countries). The groups 1 to 3 have been done by ranking countries by decreasing size and putting together the ones with similar amounts. The representation is done using Circos [42].  Table 8 A) Figure legend Table 6) that they are linked to the Bitcoin adoption. Additionally to the datasets described before, we downloaded datasets containing information about countries that share a geographic border or the language [38]. Finally, we used a database that reports the distance between each pair of countries, measured using city-level data to account for the geographic distribution of population inside each nation [38] 2 . As a preprocessing step, the variables are standardized, and the Bitcoin flow is estimated in millions of dollars. We then model the flow network maximizing the likelihood introduced below with all the variables mentioned. Despite the heterogeneity of countries in term of trends of adoption, the model achieves a R 2 score of 0.68. This confirms that the socio-economic indexes taken into consideration are good indicators for the international Bitcoin flow.
In order to identify the main drivers of the Bitcoin flow among these socio-economic indexes, we perform a variable selection. To this aim, we introduce L 1 regularization to the model. In practice we estimate the variables which minimize where λ controls the importance of the regularization term. We repeat this process increasing the value of λ from 10 −3 to 10 1 . This leads to the cancellation of the coefficients of the variables that contributes less to the flow. Here we use a 10-folds cross validation in order to set the value of λ, and we use the average mean squared error over the different folds as metric to compare the model's performance. Each of the 10 folds is related to a list of pairs of countries chosen at random. We use as test set the k th fold that contains m k couples of countries (ab) k . Calling f −k λ the model with the regularization term λ trained excluding the k th fold, we compute the cross-validation error CV k as the mean squared error on the test set: Then, we compute the mean of CV k (λ), the standard deviation (SD) and the standard error (SE) as: In Figure 8 for each value of λ tested we show the mean squared error. As the fluctuations of the cross validation error are small on a large range of λ values, instead of choosing the model with the value λ min that minimizes the error, we apply the one standard error rule. This means that we set λ = λ where λ is such that : Fitting the flow with the model described with λ = λ = 0.3, we identify the main variables (among all those selected for the study) that are explanatory of the flow, the coefficients we found for those are reported in Table 7. In that case the adjustred R 2 is equal to 0.57 even though some variables have been dropped. On one hand, the coefficient of the overall economic freedom index drops to 0 due the variable selection meaning that even though this index takes a comprehensive view of the economic freedom of a country it turns out not to be a key factor to describe the flow. On the other hand, the more specific trade freedom index appears to be, after population, one of the most important variable to describe the flow. The geographic distance appears as an impediment for the flow. In a nutshell, internet penetration, trade freedom together with GDP and population reveal to be the main potential drivers for the Bitcoin flow.

Conclusion
The blockchain infrastructure offered by cryptocurrencies like Bitcoin is attracting a variety of areas such as trade, finances, government and policy. However, it reveals a challenging task to quantify this attraction and the adoption by countries.
In this work we aimed at understanding which are the main factors that pushed the adoption of the Bitcoin as the first blockchain technology in many countries. In order to do this, we applied different techniques for deanonymizing and geolocating the users. Due to the partial anonymity offered by the blockchain, discovering the location of the Bitcoin users is a challenging task; we tackled this problem by combining a series of proxies with the transactional data coming from the Bitcoin public ledger. In the first part of the work we showed that the number of IP addresses associated to the relay nodes of the transactions, the number of Bitcoin client downloads, and the interest measured by Google Trends, all give a coherent picture about user adoption by country, even though each of them provides only a partial view of the Bitcoin system. Relying on this result, we analyzed the Bitcoin search time series to explore the evolution of the country attention, and we observed the presence of a net increasing trend of attention from 2015 to 2017, coming mostly from developing countries. Besides, considering the Bitcoin client downloads and IP addresses as proxies for user adoption, we have seen that the adoption is highly correlated with the population, the GDP per capita, the freedom of trade and the Internet penetration for the years 2012, 2013 and 2014. Overall we also confirm that the Bitcoin adoption trends have not been homogeneous all around the world: since its introduction, Bitcoin has had a fast growth in many developed countries, while its adoption in developing countries increased very slowly.
In the second part of the work, we focused on the Bitcoin flow that is still little explored in the literature, in particular due to the issues related to deanonymization. and we observed that freedom of trade, GDP and population appear as key variables to explain the Bitcoin flow.
While this work gives a hint on the socio-economic indexes linked with the Bitcoin adoption, it relies on to use of the IP addresses of relay nodes, which are available only for a restricted time period. As future work, the exploration of other datasources beside blockchain.info could provide IP information for a different period. Another interesting path to overcome this problem would be to model the behavior observed in the transactions with respect to the current distribution of the IP usage accessible, in order to infer the international Bitcoin flows for longer periods of time.
Though we consider here the total flow generated by users and business services (i.e. web-based services like gambling, exchanging, market, mining, clients, etc.), a separate analysis of these types of flows and activities could also help to understand how the Bitcoin is being currently used.