The geography and carbon footprint of mobile phone use in Cote d'Ivoire

The newly released Orange D4D mobile phone data base provides new insights into the use of mobile technology in a developing country. Here we perform a series of spatial data analyses that reveal important geographic aspects of mobile phone use in Cote d'Ivoire. We first map the locations of base stations with respect to the population distribution and the number and duration of calls at each base station. On this basis, we estimate the energy consumed by the mobile phone network. Finally, we perform an analysis of inter-city mobility, and identify high-traffic roads in the country.


Introduction
lives of the rural population (e.g. by access to banking and real-time information about agricultural commodity prices), one development objective must be to provide a roughly equal per-capita number of base stations for the entire population of Cote d'Ivoire.
We map the 1238 base station coordinates given in the D4D file ANT POS.TSV on a standard latitude-longitude projection (left map in Fig. 1). Since Cote d'Ivoire is close to the equator, such a projection is nearly distance-preserving. The base stations (coloured dots on the map) are spatially very unevenly distributed: in some parts of Abidjan there are more than ten base stations per square kilometre, whereas some subprefectures in the north of the country have no base station at all. That there should be many base stations in Abidjan is quite obvious because ≈ 20% of all citizens live in the country's most populous city. However, whether the number of base stations is proportional to its population is not immediately apparent from the latitude-longitude projection.
We will thus have to combine the base station coordinates with information about the population distribution. Here we use census estimates from the AfriPop project (http://www.afripop.org) [14]. Based on these numbers, we project the map of Ivory coast so that all regions of the country are represented by an area proportional to its population [15]. Such a density-equalising map -also known as a cartogramhas become a popular tool to visualise inequality and development challenges [16]. Plotting the base station locations on the cartogram (right map in Fig. 1) reveals a nuanced picture. On one hand, the point distribution is much less aggregated on the cartogram and thus is indeed largely proportional to population. On the other hand, the points are far from a homogeneous pattern. In Abidjan, in particular, a dense cluster of base stations remains clearly visible, indicating a disproportionately high per-capita connectivity there.
We confirm this observation by calculating the population in the base stations' Voronoi cells (Fig. 2a).
(The Voronoi cell of a given base station is the polygon that contains the area closer to this base station than to any other.) A population-proportional base station distribution would result in an equal population inside each Voronoi cell. A rank plot of population numbers (Fig. 2b), however, has a clear S-shape: although most cells have a population of around 15 000 (mean 15 500, median 12 897), there are outliers in both directions. Interestingly, the 16 lowest ranked cells are all in Abidjan, making it by far the region with the highest per-capita base station density. By contrast, the Voronoi cells with the largest populations are in rural areas near inland borders (e.g. the second ranked base station at 7.267 • N, 8.160 • W is 20 km east of the Liberian border and the fifth ranked at 9.803 • N, 3.303 • W is 6 km south of the border with Burkina Faso) or near smaller cities (the top and third ranked base station are only a few kilometres outside Bouaké and the fourth and sixth ranked near Korhogo, the country's third and seventh largest cities respectively). Because many facility location models suggest that a fair distribution of resources should intentionally be skewed in favor of less populated areas [17,18], our finding suggests these regions as targets for a future expansion of the network.

Spatial correlation between the population density and the number of calls
Recent studies of mobile phone records in developed countries [19] have argued that the number of human interactions in cities increases faster than linearly with the city population. This poses the question: does the number of calls in Cote d'Ivoire depend similarly on population density? We count the population and the number of calls on a square grid. We investigate squares of size 5 km × 5 km, 10 km × 10 km and 20 km × 20 km. We generally find that the number of calls is less correlated for smaller than for larger populations so that we divide the data into two distinct regions: one for sparsely and another for densely populated squares. We show the results for ordinary least-squares fits of the form log(number of calls) = a log(population) + b in Fig. 3a for the 5 km × 5 km grid. 1 In the densely populated regime (population > 10 000), regression yields a slope a = 0.87 with a 95% confidence interval [0.70, 1.03]. 2 (The formula for computing confidence intervals can be found for example in [20]). For larger sizes (10 km × 10 km, 20 km × 20 km) the least-squares exponent for dense populations increases, but all 95% confidence intervals include 1, the dividing line between sub-and superlinear scaling (see table 1). This finding remains true even if the call intensity is measured by the total duration rather than the number of calls. Hence, the available data give neither sufficient evidence for nor against superlinear scaling for large populations.
For small populations, however, superlinear scaling can be firmly ruled out. In this regime, the least-squares exponents for the 5 km × 5 km and the 10 km × 10 km grids are not even significantly different from zero, so that population hardly influences the call intensity at all. The explanation lies in infrastructure located away from population centres. Among the 5 km × 5 km squares with a population below 10 000, three of the ten squares with the largest number of calls are near the Buyo hydroelectric plant (6 • 14 N, 7 • 3 W). The other seven squares in the top ten are near major highways (San Pedro -Betia Road, San Pedro -Tabou Road, A4, A6, A7, A8 and A100). These locations are in zones with low population density, but the local infrastructure generates a relatively high call intensity. 1 Five kilometres is approximately the reception radius of a base station, which is the relevant length scale in this problem. However, we also state the results for the other grids in table 1.
2 Because the logarithm of zero is undefined, the regression is calculated by ignoring cells where there were no calls.
Despite the weak correlation between calls and population size, the spatial distribution of calls is far from random. In Fig. 3b we plot the spatial autocorrelation function C call on the 5 km × 5 km grid. Although C call decays quickly, the correlation is nevertheless > 0.1 up to a distance of ≈ 15 km. For comparison, we also plot the autocorrelation C pop of the population. C pop is generally a little larger than C call , but it decreases at a similar rate. It remains an intriguing question for future research whether both autocorrelations are generated by similar social mechanisms. In particular, an analysis based on a more careful socio-economic definition of "city size" [21] may still unearth more details.

Energy and carbon footprint of wireless cellular networks in Cote d'Ivoire
In this section we estimate the energy and greenhouse gas (GHG) emissions, contributing to climate Much research has shown that mobile technologies are an important instrument of current information and communication technologies for development (ICT4D) strategies, for example [22]. On the other hand, the increasing deployment of these technologies can result in increasing GHG emissions, sometimes labelled "footprint", which recently has also received increasing interest by the community of ICT4D researchers [23,24]. It is our aim to contribute to a more informed discussion through provision of quantitative estimates of energy consumption and GHG emissions. We want to precede this analysis with a qualification: in or outside of a development context the analysis of environmental impact of a technical system and its results can stand separately from the interpretation of these results towards decision making for policy formation. In this text we estimate the annual GHG emissions of the mobile network in Cote d'Ivoire and suggest directions for existing or future development of these networks from the perspective of their technical operation. However, this analysis would only provide an incomplete basis for policy making towards a development strategy as it does not include an analysis of the social or economic impacts and benefits of the wireless network.
The goal of our assessment is to estimate the national energy consumption and GHG emissions using the number of base stations as an input parameter. This requires an estimate of the power consumption per base station and the overhead from the remaining parts of the network. Depending on its type, the power consumption of a base station can vary between 800 and 2800 W (estimations presented in [25]). Without additional information about the specific types of base stations, the OCI data can only be parameterised with average data. Additionally, an assessment of the energy consumption of a mobile network should include all relevant system parts in order to enable greater transferability of results. We assume the following composition of the wireless network: the base stations, which house the antennas and amplifiers, and auxiliary equipment for cooling and power transformation provide the radio signal to subscribers.
They are controlled by several base station controllers and a few mobile service centres to which they are connected via a radio or fixed network. This network also provides connectivity with the Internet or networks of other operators. In our estimate of the GHG emissions we had to make some simplifying assumptions about the network infrastructure. We estimated the energy consumption for a single base station (including overhead for other system parts such as base station controllers) of around 2100 W based on similar assumptions made in [25] and [26] that are based on publicly available data by Vodafone.
This value is a top-down estimate based on the total energy consumption of the network and the total number of base stations. The corporate responsibility report of the Vodafone Group states that in 2011 the company globally operates 224 000 base stations and that the energy consumption was 4117 GWh [27].
This value does not account for energy consumption in offices. Given that the average power consumption per base station is around 1.5 kW, the resulting value of 2100 W per base station is plausible and further corroborated by other studies such as [28] who state that the energy consumption of the base stations constitutes 60-80% of the total energy consumption of the network.
An estimate of the contribution of the remaining parts of a mobile operator's organisation to energy and carbon footprint can, for example, be based on corporate social responsibility reports by Vodafone and O2, which state that the network accounts for around 80% to 90% of an operator's energy consumption [29,30] and constitutes a similar portion of its GHG emissions [31]. The GSMA Mobile Green Manifesto report [32] makes similar assumptions. We assume that these ratios also apply to the OCI network and networks of other operators in Cote d'Ivoire.
Based on the data inventory we have a precise count of mobile base stations (1238). In order to estimate the total annual national energy consumption by mobile networks we had to also estimate the number of  [38].
In [32] it is found that on global average, mobile networks result in 0.2% of all GHG emissions. Based on the Vodafone data, however, the portion of German mobile networks of the national GHG emissions is only around 0.1%.
Given the lack of data on the power consumption by each base station, there is a relatively high uncertainty to the estimate of the total annual energy consumption by all networks. The estimate of the carbon emissions is further affected by uncertainty in the parameter for the carbon intensity of electricity.
In OECD countries, base stations are typically operated with energy from the electrical grid. In developing countries, however, electrical energy is possibly supplied by diesel generators to a significant degree. Diesel generators result in a greater carbon intensity per generated kWh of electricity (0.788 kgCO2-eq/kWh [39], as compared to 0.426 kgCO2-eq/kWh of the average intensity of grid electricity).
In  Fig. 1 together with incoming calls are only a possible proxy to overall demand of voice traffic.
Data services and number of calls at peak time must both be considered to estimate the minimum capacity of a base station. We believe that the results of such a scenario would have too much uncertainty to bring significant value for our discussion.
Given this sensitivity analysis it remains clear, that mobile networks in Cote d'Ivoire contribute to a greater degree to the total GHG emissions of the country than those in Germany. One of the main reasons for this difference is likely to be the contrasting structure of the German and Ivorian economy to which the energy intensive manufacturing industry in Germany is likely to contribute. This assumption is also supported by a comparison of street lighting as another energy consuming infrastructure. A report by the World Bank mentions in passing that 400 000 public street lights are operated in Cote d'Ivoire [40].
Assuming that street lights have a power consumption between 35 and 400 W [41] each, they constitute a share of the total energy consumption in Cote d'Ivoire between 1.4 and 16 percent. In contrast, the street lighting in Germany constitutes only 0.56% of the total energy consumption [42].
Interestingly, if apportioned to each subscriber, the annual energy consumption of the OCI mobile network is 3.83 kWh/sub which is much lower than the same metric for customers of Vodafone Germany (16.5 kWh/sub). The value is also a lot lower than the values reported in [43] (values between 7 and 34 kWh/sub with an average of 16.7 kWh/sub). In the case of Cote d'Ivoire, this is likely to be partly the result of a sparser deployment of base stations, in particular outside of Abidjan as we illustrate in Sec. 2.
One contributing factor to this sparser deployment is likely to be the lower degree of urbanisation (52% compared to 74% Germany [35]). Another factor is the delayed introduction of data services to Cote

Detecting important routes for inter-city mobility
CDRs provide a cheap and efficient source of data to study human mobility patterns at a large scale [1].
Yet they suffer from limitations that need to be carefully considered, and in some case dealt with, to ensure the validity of the observations. A key limitation is the sparse and heterogeneous sampling of the trajectories, as the location is not continuously provided but only when the phone engages in a phone call or a text message exchange. Moreover, the spatial accuracy of the data is determined by the local density of base stations. When estimating mobility from CDRs, different approaches have been developed in the literature (see Fig. 4 for illustration).
First, researchers interested in statistical models of human mobility have adopted a Brownian motion approach [46], where each individual is considered as a particle randomly moving in its environment.
Mobility is considered as a path between positions at successive position measurements. Authors have observed statistical properties reminiscent of Levy flights, together with a high degree of regularity. Yet, the usefulness of these observations is limited by the bursty nature of phone activity, as burstiness is expected to alter basic statistical properties of the jumps, such as their distance distribution (see Fig. 5).
Even in studies where the positions are evaluated at regular intervals, the nature of the jumps remains unclear, as the method tends to detect short trips due to localisation errors, and is blind to the type of the places sampled from the real trajectory. As a side note, let us mention recent work using geo-localised web services, such as Foursquare, where users voluntarily check-in at places [47,48]. Foursquare check-ins are also characterised by a bursty behaviour, but they provide a GPS accuracy, and semantic information (at the office, travelling, etc.) that might solve the aforementioned problems.
The second approach relies on the idea that mobility consists of moving from one place to another. The observation of mobility patterns thus requires one to define and identify important locations. A trajectory is seen as a set of consecutive locations visited by the user. Important locations can either be defined as a place where a user spends a significant amount of time, which he visits frequently, or where he has stopped for a sufficiently long time [1,[49][50][51]. This approach provides a more intuitive picture of mobility, where the sampling is determined by the periods of rest of the user. However, it is blind to the multi-scale nature of human mobility, as it requires the parametrisation of thresholds in time and in space to identify important locations. The value of the threshold and the corresponding granularity of the places depends on the system under scrutiny, say cities for international mobility or rooms for human mobility inside hospitals [52].
When measuring human mobility from CDRs, it is important to remember that mobility is about space and time. Both aspects must be carefully considered to provide a faithful description of human trajectories, especially in situations where the sampling of the data is heterogeneous. For this reason, each transition should be remembered as a jump in space over an interval in time and, if possible, be put in relation to the previous and following transitions. Contrary to the universality viewpoint of [46], not all transitions are alike. On the contrary, it is possible to extract different information and different types of mobility patterns by focusing on different regions in space-time. This filtering has been adopted in various studies, but usually either in space or in time. Let us mention [51], where transitions between identified places are considered only if they are registered within two hours of each other; in [50] the daily range of mobility is calculated, and in [8] a trip is defined as a displacement between two distinct base stations occurring within one hour in each time period. More complex filters can be defined on so-called handoff patterns, that is a sequences of cell towers that a moving phone uses while engaged in one voice call, e.g. in [49] where only sequences of more than 5 cell towers are included. Let us note that a filtering in space and in time allows for the selection of a characteristic velocity and, if needed, of the removal of noisy transitions occurring at a small spatial scale, e.g. transitions between neighbouring cells of a static user, or long temporal scale, e.g.
transitions over several days where several intermediate steps are expected to be missing.
This overview of recent research suggests direct applications that would be of particular interest in a developing country, where empirical data on human mobility tend to be lacking. Using the aforementioned methodologies, it would be possible, for instance, to identify and map nationwide commuting patterns.
Traffic tracking and route classification would also be possible after additional data is collected from test drives or signal strength data collected by high-resolution scanners [49]. In this work, we illustrate the potential benefits of a CDR analysis by focusing on the detection of high-traffic roads between cities. Such a detection might help deploy new infrastructure where the population actually needs it, e.g. in regions where mobility is high but the infrastructure is poor. Finding high-traffic roads requires one to filter transitions in the two-dimensional space of Fig. 5. To do so, we apply the following procedure. We consider only transitions in a certain velocity range and occurring in less than a predefined time interval. Our choice of velocity range for car mobility is [15,150]km/h, in order to discard pedestrian motion and noisy points, i.e. due to antenna switching instability. For our analysis, we have used the data from POS SAMPLE X.TSV source files, containing separate users' traces, in the form of a list of user -antennatimestamp for each call or SMS, together with the antenna positions from the ANT POS.TSV file. The lower bound for the time interval between two points has been set as the minimal value between two actions. For the upper bound, a one hour limit has been chosen in order to balance between sufficient data points and accuracy. Moreover, to remove noisy connections and to identify persistent motion, we have removed weak transitions between antennas, i.e. occurring less than 10 times. This operation leads to a fragmentation of the network into connected components which we further exploit by keeping only components composed of at least ten vertices. This operation has the advantage of removing undesired connections due to antenna switching and not associated to motion. Our results are robust under variations of the above parameters.
The described technique gives a good approximation of the most important human migration pathways (see Fig. 6) and thus can be used for alternative road construction or improvement. Interestingly, it also allowed us to identify unknown roads, which we could validate a posteriori. Examples are shown in Fig. 7 and Fig.   8 where roads that were absent in Microsoft maps but found by our algorithm are found in maps provided by OpenStreetMap and Yahoo respectively. This can be of particular interest for a semi-automated map improvement technique: if a strong connection is found from the CDR information, but there is no road on the map, it should be analysed carefully whether an existing road has so far been overlooked.

Conclusion
In this article we have presented how an analysis of the Orange D4D mobile phone data base reveals important patterns of communication infrastructure and mobile phone use in Cote d'Ivoire. The placement of base stations is biased towards Abidjan so that one development goal is an enhancement of the network in smaller cities and rural regions. We estimate that the network currently consumes between 2.88 and 3.83 kWh of energy annually per subscriber. Although this figure is less than in an industrial country such as Germany, the fraction of the national energy consumption spent on mobile telephony (estimated between 0.95% and 1.90%) is actually higher. Finally, we argued that mobility data from CDRs need further filtering to extract truly meaningful commuting patterns. We used the mobility traces that were part of the Orange D4D database to demonstrate how the main roads in Cote d'Ivoire can be identified. where even on the cartogram the dots are noticeably aggregated. The colours of the dots do not exhibit any clearly visible large-scale trends. However, a more careful statistical analysis shows that a significant correlation between traffic at nearby base stations exists (see Fig. 3b).   To illustrate the different ways to uncover mobility patterns from CDRs, let us focus on the motion of an individual in Brussels, as measured by his GPS. The user took his car in Watermael and went to two shops, one in Auderghem and one in Waterloo. The three locations are plotted in red. Three phone calls were made. One at home, one on the highway, and one in Waterloo. An approach where a path is composed of successive position measurements is shown in pink. In contrast, an approach where paths are based on important locations would detect the stop in Waterloo, rightly discard the one on the highway, but would still be blind to the location in Auderghem.