Evolution of urban forms observed from space

Multiple driving forces shape cities. These forces include the costs of transporting goods and people, the types of predominant local industries, and the policies that govern urban planning. Here, we examine how agglomeration and dispersion change with increasing population and population density. We study the patterns in the evolution of urban forms and analyze the differences between developed and developing countries. We analyze agglomeration across 233 European and 258 Chinese cities using nighttime luminosity data. We find a universal inverted U-shape curve for the agglomeration metric (Lasym index). Cities attain their maximum agglomeration level at an intermediate density, above which dispersion increases. Our findings may guide strategic urban planning for the timely adoption of appropriate development policies.


Introduction
Urbanization is driven by the material and social benefits that arise from the size and population density of cities [1][2][3]. The tremendous growth of cities may enable poorer countries to develop, eventually reaching the wealth of richer nations. At the same time, the urbanization process causes congestion, environmental pollution, and regional economic inequality. By 2050, nearly 90% of the projected 2.5 billion growth in urban population will be concentrated in Asia and Africa [4]. This makes the sustainable growth of cities in the developing world an issue of significant global importance. Appropriate planning policies based on in-depth understanding of the evolution of urban forms are needed to alleviate the negative effects of rapid growth and maximize its economic benefits. City urban form generally refers to the spatial distribution of human activities within a city, such as places of work and housing density. Far-reaching low-density sprawls and compact high-density cities are two extremes of urban forms that reflect the dominant effects of dispersion and agglomeration, respectively.
Krugman's core-periphery model [5] and the Alonso-Mills-Muth classic monocentric urban model [6][7][8][9][10] both attribute the agglomeration of firms and population to the ef-fects of transportation costs on goods and economies of scale. These two primary factors theoretically explain the transition from isolated small settlements to a concentrated core urban area (central business district, or CBD) and its periphery. Original core-periphery models and their derivatives [11] suggest an inverted U-shaped curve for the level of dispersion as a function of transportation costs, where both low and high costs result in high dispersion, but intermediate costs result in agglomeration (low dispersion). Given that the transportation costs for goods have been dramatically declining since the 1960s [12], an inverted U-shaped relationship would imply that dispersion should have prevailed, reducing agglomeration and increasing the sprawl of cities [13]. However, in contrast to the model predictions, many cities have developed a polycentric urban form with a dominant CBD and sub-centres (the number of which scales sub-linearly with respect to population size [14,15]), suggesting that agglomeration may be prevailing. This should not come as a complete surprise, however, because the service sector has largely replaced manufacturing as the major industry in cities [12,16], and transportation costs for goods are becoming insignificant in the service economy (3% of the GDP in the 1990s compared to 8% of the GDP in 1929) [12]. At present, it is likely that dispersion is hindered by the costs of moving people instead of moving goods [17], because service firms attribute economic benefits to face-to-face high-information throughput contacts [18]. Indeed, the costs of commuting have been increasing dramatically along with increasing density (see S1 in supplementary information (SI, Additional files 1, 2), Table 2 of [12], and panel d in Fig. 3 of [16]). In this new context of the service economy and increasing costs of moving people, the two competing forces of agglomeration and dispersion are still trying to find their balance point.
The primary purposes of this paper are to understand how agglomeration and dispersion are manifested with increasing population and population density, to study the patterns in the evolution of urban forms, and to analyze if such patterns differ between developed and developing countries. If there is a universal evolution pattern for urban forms, then appropriate and timely planning policies to alleviate some of the negative side effects of urban growth could be developed and applied to developing economies.

Characterising urban forms
Understanding urban forms usually involves the collection and analysis of high-resolution socio-economic [19,20] and telecommunication data [14], which is costly to acquire and may be outdated. In many developing countries, such data is often coarsely aggregated, infrequently sampled, or unavailable to the public. As an alternative, recent advances in applications of nighttime luminosity data (NTL) provide a free and timely method for the analysis of urban forms. Empirical studies have shown that NTL is capable of mapping local economic activity [21][22][23], so the spatial distribution of luminosity across a whole city can reveal its underlying urban form. In this study, we use NTL collected by the Defense Meteorological Satellite Program's Operational Linescan System (DMSP-OLS). The data has global coverage with a spatial resolution of 30 arc seconds (approximately 1000 meters of earth surface) and has been released annually since 1992 [21].
The dispersion and agglomeration level of cities can be quantified by the geometric characteristics of a Lorenz curve, a method widely used in economics to quantify inequality in wealth distribution. The curve relates a cumulative share of a variable (income, wealth, or in our case luminosity) to the cumulative share of entities that are characterized by this variable (individuals sorted in the order of increasing income, or geographical areas sorted in the order of increasing luminosity). One of the characteristics derived from the Lorenz curve shape, the Gini coefficient or index, is often used to measure the degree of inequality of urban development [24,25].
However, two Lorenz curves with similar Gini may describe two different types of inequality: one stemming from a few entities of extremely high value (curve skewed above the reverse diagonal, see a of S2 in SI), and the other from a large number of entities with values just above the average (curve skewed below the reverse diagonal, see b of S2 in SI). To distinguish such situations, a complementary measure that identifies in which direction the curve is skewed was proposed in [26,27]: the Lorenz asymmetry coefficient (Lasym). When applied to the analysis of urban forms, high Lasym (> 1) suggests that agglomeration dominates dispersion, and low Lasym (< 1) suggests the opposite. A symmetric Lorenz curve (Lasym = 1) suggests that spatial human activities are lognormally distributed. This study adopts the definition of [26], where Lasym ∈ (0, 2).
It should be noted that Gini by itself is not a measure of dispersion. Instead, it is a measure of decentralization [28], because low Gini can indicate two kinds of inequality of urban development: 1) decentralization occurs with intensive agglomeration on local level that leads to polycentric urban forms (see panel a of S3), and 2) decentralization occurs with weak agglomeration that leads to more generally dispersed urban forms without significant sub-centers (see panel d of S3). We retain Gini in our analysis, since the decentralization level might be of significance for wealth creation. In this study, we compute both Gini and Lasym for 233 European and 258 Chinese cities, and validate if a universal evolution path exists.

Urban forms observed from space
The Lorenz curve and its characteristic coefficients (Gini and Lasym) are constructed over a specific set of entities (here, the smallest areas resolved in a satellite image -pixels with measured luminosity) and are therefore highly affected by the definition of a city boundary. When the boundary is large, the vast dark areas surrounding densely populated regions are included, and both Lasym and Gini may be overestimated; when it is too small (encompassing only brightly lit areas), the Gini may be so low that the city is mistakenly regarded as uniformly developed.
The boundary definition problem is common in urban science, especially in studies of the scaling behaviour of various socio-economic indicators [2,29]. Generally, two types of boundaries are employed in urban studies: morphological, based on a certain threshold in population density, and functional, based on the density considerations and economic links (approximated by commuter fluxes) between regions unified into a single labor market.
We tested the application of morphological boundaries (in the form of Urban Areas defined by the Census Bureau in the United States) for cities with populations larger than 50 000. The Gini indices for most of these cities are close to zero, indicating uniform development. Therefore, similar to [30], we use a functional definition of a city as a unified labor market, comprising dense urban cores and all suburban areas that have substantial fractions of workers commuting to them. The US, the Organisation for Economic Cooperation and Development (OECD), and the European Union (EU) have all employed the functional approach to define their city boundaries. However, only the EU boundaries  Figure 1 illustrates NTL distributions and their characteristics for two cities of similar population density within their functional boundaries and with similar Gini indices (additional examples can be found in S3 and S4 of SI). As evident from the luminosity maps, both Brussels and Kirklees have developed multiple sub-centers. However, Brussels's CBD is much brighter than any area of Kirklees, and the fall-off in luminosity is sharper. The Lasym for Brussels is 1.11, significantly higher than 0.90 for Kirklees, indicating Brussels's higher agglomeration level.

Universal evolution of urban forms
The evolution of urban forms, or changes in the balance between agglomeration and dispersion, can be observed through Lorenz curve indices expressed as functions of population density. The relatively short history of the NTL data available dictates a crosssectional approach, where we analyze Gini and Lasym indices for all European Union and Chinese cities (see Fig. 2).
In agreement with [28], decentralization (decreasing Gini) invariably accompanies increases in density, both in developed and developing economies. However, Gini indices of Chinese cities are generally higher than European ones at the same level of density, which means decentralization in the EU occurs earlier than in China. This observation is explained by the simulation model presented later in the paper (see the "Transportation model" section).
The agglomeration level (approximated by Lasym) exhibits an inverted U-shaped profile for cities in both developed and developing economies (Fig. 2). In developed countries, the peak of agglomeration occurs at a density of 189.9 people/km 2 in 2006, while in developing ones it is delayed until the much higher density level of 1025 people/km 2 in 2000. In agreement with the core-periphery models (Krugman [5] and Alonso-Mills-Muth [6][7][8][9][10]), agglomeration is dominant at a certain level of transportation cost, proxied by the population density in this study. As illustrated in the supplementary S1, the data on congestion as a function of density in EU cities shows a clear positive correlation between the two, with the median congestion level observed at a density of 240 people/km 2 . It is noteworthy that this density level is approximately where the agglomeration peak measured by Lasym occurs. Unfortunately, due to insufficient traffic data from Chinese cities, the density of the median of transportation cost cannot be reliably estimated.
A similar inverted U-shaped relationship exists between Lasym and congestion level (measured by Tomtom) in EU cities (see S5). Gini as a function of congestion level is rel-atively stable (see S5b), contrary to the decreasing Gini as a function of density (S5A). This behaviour may partly be attributable to the differences in definition of metropolitan area boundaries (Tomtom dataset is proprietary and no information on the boundary is available)-for example, Dublin ranks 59th in population out of 233 EU cities but is classified as 'small' city in Tomtom, while Lyon ranking 62nd is classified as 'medium' city.
We also explore if cities are going through the peak of their U-shaped evolution path during the period covered in the study, i.e., between the two snapshots available in the NTL data. We classify cities as pre-or post-peak (density lower or higher than a threshold) based on the density at which the Lasym maximum occurs in the first snapshot (in 2006 for EU cities, and in 2000 for Chinese cities). Specifically, cities from developed and developing economies are divided as follows: 1) pre-peak cities whose densities are smaller than 189.9 people/km 2 for EU cities and 1025 people/km 2 for Chinese cities, and 2) post-peak cities whose densities are larger than 189.9 people/km 2 for EU cities and 1025 people/km 2 for Chinese cities. Accordingly, we produce a scatter plot of differential changes in their Gini and Lasym indices, as presented in S6. Most of the cities have decreasing Gini, which confirms the universal trend of decentralization. However, we do not investigate whether or not decentralization occurs along with agglomeration, which results in two distinctive urban forms, polycentric and extensive sprawls. Additionally, significant fraction of Chinese post-peak cities has decreasing Lasym indices, and pre-peak Chinese cities have slightly increasing Lasym indices. This suggests that Chinese cities are currently going or have recently gone through the peak of their U-shaped evolution path. On the other hand, such a phenomenon cannot be observed in EU cities. This might be caused by the relatively short time span between the two observations in EU cities (4 years compared to 10 years for Chinese cities). To illustrate how cities evolve along their U-shaped path, we randomly select ten Chinese cities from pre-peak and post-peak groups and demonstrate changes in their Gini and Lasym indices over time in Fig. 3. In agreement with our findings, the Gini indices of both groups have decreased between 1996 and 2010 (see Fig. 3c and d), except for a few cities such as Tongchuan, Langfang and Shanwei. However, the Lasym indices of pre-peak cities have shown heterogenous trends (Fig. 3a)-Lasym index of seven cities have increased over time, staying constant or decreasing in others. Importantly, the cities with increasing Lasym expressed a clear inverted U-shaped curve, even though the time when peak emerged varied. For the post-peak group (Fig. 3b), the Lasym of all cities except Guilin have constantly decreased over time. The Lasym graphs suggest most of the cities considered have gone through their inverted U-shaped evolution paths, supporting our findings. However, some cities have not, which requires further investigation in future studies.

Transportation model
The common trend for agglomeration to attain its peak value at certain intermediate density (Fig. 2) is observed in cities from both developed and developing economies. However, there is a differentiation in the level of density at which agglomeration peaks occur. The difference is likely caused by the transportation infrastructure that determines costs of moving people. Indeed, China's considerable investments in transportation infrastructure during the past decades have improved congestion a great deal [31]. These transport improvements were found to efficiently reduce commuting costs [32] and allow further agglomeration of employment in the CBD. To rationalise agglomeration trends observed empirically through night-time luminosity, we employ stochastic model of a city by Louf and Barthelemy [15], developed to explain polycentric transition that cities experience with growth. Similar to the original model, the city is reduced to the out-of-equilibrium system with two types of agents: households and activity centres. Activity centres are randomly distributed in space on a plane unit circle and households are continuously added to the system cumulatively and randomly. Based on the available choice of potential employment centres households would select particular one with a maximum utility, i.e., attainable wage less the costs of commuting. The model qualitatively and quantitatively explains the emergence of a CBD and transition from monocentric to polycentric urban structure as the population grows. Moreover, it explains the process of some sub-centers eventually losing attractiveness due to the congestion caused by incoming traffic.
To account for the role of infrastructure in shaping urban forms, we modify the citygrowth model by Louf & Barthelemy [15] and qualitatively explain how agglomeration is affected by the resilience levels of transportation infrastructure. Following [15], the utility that every additional household is maximising is determined as follows: where Z ij is the net income for a worker living at location i and choosing to work to work at j. The first term (T(j) + 1) β j is the maximum attainable average wage paid by firms at j, where T(j) is incoming traffic to location that is the numbers of workers choosing to work at location j); and β j ∈ (0, 1) is the scaling factor with a gamma-distribution (k ∈ [0.5, 1], θ ∈ [1, 2]) with mean ≈ 0.12, suggesting the wages of most sub-centers have an elasticity of 0.12 and only a few areas of extremely high wages [30]. The second term is the transportation costs: d ij is the Euclidean distance between locations i and j; l is the maximum effective commuting distance that residents can financially withstand, c is the road network capacity, and ν ∈ (1.4, 1.5, . . . 1.8) is the universal infrastructure fragility level for each city, the higher the value the worse the infrastructure the city has. The combined value of ν + β j is the fragility level to the congestion of location j (see supplementary note 1 for more details of the model). The citywide fragility parameter ν is effectively the sensitivity of sub-centers to congestion-induced increases in commuting costs and associated loss in attractiveness. It controls the capability of a city to sustain agglomeration of employment.
The simulation results are summarised in Fig. 4, with the output of the original Louf and Barthelemy model highlighted in black (L-F model deviates from the empirical observations for Lasym index, which motivated our modifications that we detail in supplementary note 1). In agreement with the observations in Fig. 2, decentralization occurs with growing density and inverted U-shaped curves are emerging. The lower the citywide fragility parameter ν (the more resilient the road networks are), the later the peak in agglomeration level occurs. Cities with high ν have their Lasym decreasing faster after crossing the peak, while the rest maintain their Lasym at relatively high values. As China has invested significantly in transportation infrastructure, we assume that Chinese cities have lower ν, so the simulation explains why agglomeration in Chinese cities peaks at higher densities Figure 4 Gini and Lasym as a function of density obtained from the simulation of 1000 systems. Supplementary note 2 discusses the implementation details of the simulation than in EU cities. The model also suggests that the agglomeration in China can be sustained across a wider range of population density. Notably, the Gini of cities with lower ν is significantly higher than the ones with higher ν at the same level of density.

Economic effects of agglomeration on wealth creation
The transportation model presented in this study generates inverted U-shaped curves observed in the real world. The fragility of road networks ν affect decentralization levels, the density level at which agglomeration peaks emerge, and the sustainability of agglomeration with increasing density. Given the differences in agglomeration levels observed in cities from developed and developing economies, we want to test how agglomeration and decentralization affect wealth creation while controlling for both density and geographical divisions (eastern EU cities, western EU cities, and Chinese cities).
The linear regression model with GDP per capita as a dependent variable (see Table 1) suggests that both density and agglomeration have a significant positive impact on wealth creation, but decentralization does not (similar to the findings of [19,33]). While agglomeration and dispersion are regarded as two ends of a continuum, density and agglomeration are so important in wealth creation that policy makers in developing economies are motivated to sustain agglomeration at the highest attainable density, which according to our model makes investment in resilient road networks crucial.

Discussion
The emergence of prosperous cities outside of the developed world has allowed developing economies to narrow the economic and social gaps, even though this fast growth poses significant environmental [34] and social challenges [35]. Policy makers from developing countries are learning the lessons that come along with explosive growth, when the lack of understanding of evolution of urban forms can lead to extreme outcomes of failed 'ghost cities' [35]. In our study we provide insight into the factors that affect morphology of cities and govern their evolution. We use NTL to study urban forms in both developed (European Union) and developing (China) economies and conduct spatial analysis of luminosity distributions via Gini inequality and Lorenz asymmetry indices. Our results suggest that there is an inverted U-shaped relationship between the agglomeration and population density. Cities attain their maximum agglomeration level when their density reaches a certain intermediate level. Above this level they experience increasing dispersion. Importantly, this peak agglomeration density varies between different types of economies -172.6 people/km 2 for developed European and 863.3 people/km 2 for developing Chinese cities in 2010. We argue, and support this by modelling, that this is a direct result of significant investment of Chinese cities in their public transport infrastructure that increases transportation network capacity and resilience. An empirical study shows that the intensive development of subway network plays an important role in Beijing's urban agglomeration and new firm formation [36]. Additional 'last mile' initiatives (like the dock-less bike sharing schemes) augment the existing subway network. These combined measures eventually alleviate road congestion and help to sustain the high-density city core [37].
The emergence of a peak in the agglomeration level, and differences in the positioning of this peak with respect to population density in different types of economies, may suggest that cities follow a universal path in the evolution of their urban form. Given that high density confers demonstrable benefits of increased productivity, creativity, and wealth (as supported by our regression model), it is important for city planners to understand the evolutionary stage their city is experiencing. Development policies should be timely and consider if the proposed investment will facilitate transition to, or allow maintenance of, the optimum morphology. Conversely, aggressive land development and urbanization well before the agglomeration peak occurs could lead to failure. Therefore, it is not surprising that Ordos, the famous 'ghost city' in China which grew much faster than any contemporary and comparable city, is now facing long-term demographic and financial trouble. Considerable investment in building a planned subcenter [38] when its population density was only 270 people/km 2 (well below the typical peak agglomeration density value for China) may not have been justified. It will be of vital importance to review construction investments for cities from the developing world to understand to what extent they might succeed and to suggest a more natural form of growth if their current development plans allow for that.
On a practical level, the novelty of our approach is the idea that transport network resilience and capacity is what defines the "peak agglomeration density" (also supported by the EU-China differences in density levels). Additional studies are needed to quantitatively establish the link between the investment in infrastructure and sustainable/attainable density. If such quantitative relationship can be determined, it would allow urban planners to judge whether an investment is warranted, premature, or critically needed.
Even though our study provides empirical evidence of the inverted U-shaped relationship between the agglomeration and population density, we found that Gini and Lasym coefficients were both sensitive to the fraction of zero-luminosity pixels, which generally represent the uninhabited areas, such as water bodies. This study did not exclude NTL data from those uninhabited areas. Therefore, a future study might benefit from using