Does noise affect housing prices? A case study in the urban area of Thessaloniki

Kamtziridis, Georgios; Vrakas, Dimitris; Tsoumakas, Grigorios

doi:10.1140/epjds/s13688-023-00424-3

Regular article
Open access
Published: 17 October 2023

Does noise affect housing prices? A case study in the urban area of Thessaloniki

Georgios Kamtziridis ORCID: orcid.org/0000-0002-6097-0506¹,
Dimitris Vrakas¹ &
Grigorios Tsoumakas¹

EPJ Data Science volume 12, Article number: 50 (2023) Cite this article

1617 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

Real estate markets depend on various methods to predict housing prices, including models that have been trained on datasets of residential or commercial properties. Most studies endeavor to create more accurate machine learning models by utilizing data such as basic property characteristics as well as urban features like distances from amenities and road accessibility. Even though environmental factors like noise pollution can potentially affect prices, the research around this topic is limited. One of the reasons is the lack of data. In this paper, we reconstruct and make publicly available a general purpose noise pollution dataset based on published studies conducted by the Hellenic Ministry of Environment and Energy for the city of Thessaloniki, Greece. Then, we train ensemble machine learning models, like XGBoost, on property data for different areas of Thessaloniki to investigate the way noise influences prices through interpretability evaluation techniques. Our study provides a new noise pollution dataset that not only demonstrates the impact noise has on housing prices, but also indicates that the influence of noise on prices significantly varies among different areas of the same city.

1 Introduction

The real estate market plays an important role in people’s lives, from individuals and families, to small businesses and large corporations. The process of purchasing or renting a property, whether for residential or commercial purposes, mainly depends on the economic and financial planning of a family or a company. Additionally, it is strongly related to the macroeconomics and the financial stability of much larger groups of people such as countries. Any sign of inconsistency or fluctuation in the real estate market can provoke apprehension in the state, trigger an economic recession or, ultimately, even lead to financial crises through housing bubble bursts. The potential risks are well known to the concerned parties and more importantly to governments that monitor the market on a regular basis. Banks have also invested greatly in real estate in order to obtain accurate house pricing estimates for mortgages and housing loans. These organizations often need to estimate the value of a given property for auctions or damage control when clients are unable to pay their debts. Besides states and organizations, property owners and investors should have the right to access valuable insights about the value of their properties too. This knowledge can increase the efficiency of managing assets or even help make profitable property investments.

Property estimations are performed by human experts like real estate brokers and engineers. This estimation process considers properties’ features and amenities, as well as external factors such as bus station density or distances to city centers. These are combined with other metrics, like the House Price Index [1], which tracks the changes in property prices, to arrive at a price estimate. During this process, there is no way of quantifying the accuracy of prediction nor the importance of each component that was included in the task. Therefore, the absence of confidence increases the risk of the forthcoming decision, which can end up being financially harmful.

In the contemporary world, the real estate market is represented mostly through different web-based services. In each country, there are numerous websites with vast amounts of properties available for renting or buying. These data have been utilized in the past for different analyses, ranging from creating models capable of predicting house prices based on their features to estimating prices over time in order to understand their seasonality. There has been a lot of research on this topic over the years, with big real estate datasets containing hundreds of properties being used to train machine learning models with the ultimate goal of providing meaningful price estimates. These datasets contain basic property features that are specific to the building itself, such as location, size, floor level and heating type to name a few. Moreover, they can incorporate other features related to the surrounding area of the property, such as road network accessibility and distances from basic points of interest. All these features contribute to the urban profile of a neighborhood, which can directly or indirectly affect prices to a great extent. The importance of these features and their correlation to the price estimates have been validated in previous research [2–5].

Environmental factors have not been taken into consideration in the literature as much as they should have, despite their obvious role when selecting a property. The two most popular ones are the air quality index and the noise pollution. The first indicates the level of cleanliness in the air that influences the overall health of the population in a given area [6–8]. The second one is related to the actual noise caused by road traffic, crowds, aviation and other factors such as the presence of night clubs or manufacturing establishments. The influence of noise pollution on the health of citizens living in an urban environment is well-established. There are numerous cases in the research literature underlining the negative aspects of noise [9–12].

Environmental Noise Directive (END)^{Footnote 1} is the primary law in the European Union (EU) dealing with noise pollution affairs. One of its main goals is to inform the public about the environmental noise and its effects on people’s health. Moreover, it requires from EU countries to provide noise maps and noise management plans on a regular basis.

Although noise pollution plays a major role in the nature of a neighborhood, research on its impact on house prices remains largely underexplored. To some extent, this is to be expected given the practical challenges of gathering environmental data, such as expensive measuring and monitoring tools, specialized software, and on-site orchestration of distributed sensors. In Greece, these studies are conducted by large corporations or state departments that subsequently hold the data for internal use. Of course, there are some crowdsourced initiatives [13] that aim to collect noise data, but for small countries like Greece, these are usually inadequate.

The impact of the real estate market on a country, in addition to the innovations that can emerge through research in the field, highlights the potential profit of such work. Being able to generate valuable environmental features of an urban area and, then, use those in the housing price prediction problem can help individuals, small and medium-sized businesses, all the way to large corporations, banks and government experts make profitable decisions. Aside from profitability, it can shed light on the various factors that influence prices. Knowing if and how the environment affects housing prices can assist urban planners to design more functional and efficient cities.

In the first part of this paper, we extract environmental data, and more specifically noise pollution, from published scientific studies. We focus on studies performed by the Hellenic Ministry of Environment and Energy^{Footnote 2} for the urban area of Thessaloniki, Greece. The end results were published by the government with heat maps demonstrating the spatial distribution of noise across the city. However, none of the core noise measurements were made public, making any future use or contribution to the field difficult. We have managed to overcome this limitation by meticulously re-creating the sense of noise into a general-purpose and easy to use dataset.

In the second part of this work, we highlight the importance of noise in predicting house prices. To verify this, we have used the property database of Openhouse,^{Footnote 3} which is a real estate platform operating in major cities of Greece and, mainly, in the area of Thessaloniki. Regarding the machine learning models, we choose to use ensemble methods that proved to work well in the research literature. The property and the noise data are used to create multiple models with distinct configurations, exploring different aspects of the same problem.

The main contributions of this work are:

1
A new general-purpose sense-of-noise dataset, as well as a new housing price dataset containing noise information for the area of Thessaloniki.^{Footnote 4}
2
An extensive experimental evaluation of the contribution of noise in the property price estimation process via ensemble models such as XGBoost [14] and light gradient boosting [15] models.

2 Related work

This section presents relevant research in the field of housing price prediction from a data perspective. It is important to discuss key relevant work in order to better understand the current state of the area, as well as to position this paper properly within the literature. We begin by outlining the most recent and best-performing solutions proposed for housing price estimates, considering basic property features. Then, we showcase approaches that incorporate various environmental features, with a specific focus on noise pollution. In both cases, we aim to investigate how the various features, especially environmental noise, affect prices.

Baldominos [3] studies the housing price prediction problem in the Salamanca district of Madrid. With a collection of 2266 properties from popular online sites containing the fundamental characteristics, they test the correlation between the features and the price to find out that size is the most important one. They use these data to construct various regression models of different specifications, such as support vector machines, multi-layer perceptrons and ensembles of regression trees, all trying to predict prices given the features. The final results showcase the superiority of the ensemble trees when compared to others. Imran [16] follows another approach for the capital of Pakistan, Islamabad. Alongside the basic property characteristics, they gather some features related to the surrounding area of a property. For instance, they attempt to include neighborhood related information through binary values (yes/no) indicating the existence of core amenities and services like hospitals, schools and entertainment. Although their experiments encapsulate many features, the results show that besides the total size, the number of bedrooms and bathrooms, also, radically influence the price, with support vector machines being the best performing model.

Truong [5] focuses on the Beijing area by using the “Housing Price in Beijing” dataset which contains more than 300,000 properties. Each property, apart from its standard attributes, has various spatial information like distance from the city center and subway accessibility. The exploratory analysis demonstrates direct correlation between the location and the property price, since each district has a different price range. Initially, random forest [17], XGBoost and lightweight gradient boosting models were used for training. Then, the authors combine these to build a stacked generalization model [18] by placing random forest and lightweight gradient boosting at the first level and XGBoost at the second one. This architecture outperforms any of the individual ones in terms of accuracy, with a much higher computational cost. Similarly, Xue [19] accumulates property data and urban details like bus and metro stations and routes, traffic and road network information for the city of Xi’an, China. The urban data are preprocessed and new meaningful indices are introduced. The property features and the new indices are utilized by ensemble models to highlight the fact that size is, again, the most influential factor in the matter of predicting prices. Additionally, they illustrate the importance of the neighborhood of a property, because the next most important group of features is related to the spatial indices. Along the same lines, Kang [20] engineers relevant features from more generic urban characteristics like human mobility patterns and socioeconomic data. They experiment with a gradient boosting ensemble in order to analyze features’ significance, where they come to the conclusion that some spatial features can play a more decisive role when it comes to predicting prices. For example, the prices of properties located near university campuses are mainly affected by the distance to the campus rather than their total size.

Environmental conditions can, also, act on prices. Chiarazzo [21] gathers property and air pollution data for the city of Taranto in Italy, which is marked as a high environmental risk area due to its heavy industry. With feature selection and an artificial neural network they put to the test the correlation of each feature through an one-by-one elimination process. Interestingly, they state that sulfur dioxide concentration, one of the five major air pollutants, is the most determinant with respect to price, ranking higher than other characteristics such as floor level and distance to the city center. Shanghai is another industrialized city, where Zou [22] evaluates the air pollution phenomenon in connection with property prices to quantify even more their relation. A total of 27,608 properties in conjunction with air pollutants are used as training data in a gradient boosting model which it attributes 1.6% in terms of contribution. Under no circumstance, this percentage can be considered as minimal, since a reduction of 1 μg/m³ in nitrogen dioxide increases the price by roughly 278 Yuan per square meter.

Regarding noise pollution, there is much less research available attempting to correlate house prices to noise levels. In general, noise pollution is measured in decibels, where higher values suggest noisier environments. Blanco [23] uses hedonic models to analyze the connection between prices and noise levels in three different areas in the United Kingdom. They suggest that when evaluating properties with similar amenities the presence or absence of noise affects people’s choices. In particular, the way noise impinges on prices differ depending on the area, where in some there is a positive correlation and in others a negative one. Brandt [24] investigates the same hypothesis in the city of Hamburg, Germany by combining multiple sources such as road, air and rail traffic noise pollution with hedonic models too. They highlight the non-linear relationship among noise and price by stating that price decreases significantly lower in areas with low levels of noise, as opposed to high noise level areas where the decrease is more remarkable. Contrary to Brandt’s work, Szczepanska [25] study the noise effect on two rather dissimilar locations, with reference to noise, in the city of Olsztyn, Poland. They indicate the existence of linear correlation between prices and noise pollution which underlines the notion that location can influence the noise-price connection in great measure.

Tsao and Lu [26] collect property data from the Ministry of the Interior of Taiwan for the city of Taoyuan and enhances them with a five year period of noise pollution data from the international airport of Taoyuan. The authors investigate the way aviation noise impacts the real estate market of the city, due to heavy air traffic in lower altitudes, with hedonic models. The models indicate that as the number of flights increases on top of an area, which translates to more noisy conditions, the prices of the corresponding properties decrease noticeably. Moreover, they measure the rate of price decline in certain decibel ranges and conclude that for roughly 65 dB of noise due to air traffic the decrease in price can get to 2356USD, where for more polluted areas the decline reaches the amount of 3622USD. Similarly, Morano [27] study the area of Bari, Italy in order to link noise pollution to house prices, with a total of 200 properties and noise information from the Strategic Noise Map of Bari as well as perceptual views for the quality of an area with regards to noise from residents. To measure the effect of noise, they employ a variation of a data-driven technique known as Evolutionary Polynomial Regression, or ERP [28], referred to as ERP-MOGA [29] which utilize genetic algorithms. The final results outline the negative correlation between prices and noise levels, where highly polluted areas lead to cheaper housing.

The studies mentioned previously span across different cities, countries or, even, cultures. Even though cross-cultural validation [30] is out of the scope of the current paper, we think it’s important to mention it since it can fuel future work around this topic.

The related work indicates that the forefront of housing price prediction has been dominated by machine learning approaches, demonstrating their effectiveness in capturing intricate relationships within diverse property features. However, in the realm of incorporating noise pollution as a crucial determinant, prevailing methodologies have largely relied on conventional hedonic regression models. In this study, we endeavor to utilize machine learning models, with a specific emphasis on noise pollution as a pivotal predictor of housing prices. Moreover, we leverage modern explainability techniques, which have demonstrated efficacy in prior research [31], to untangle the complex dynamics between noise levels and their impact on the real estate market. Through these efforts, we aim to provide a comprehensive and innovative perspective on the interplay between environmental factors and property valuation. These two focal points represent the primary distinctions between the current work and its counterparts in the related literature.

3 Noise data reconstruction

As previously stated, noise data are difficult to obtain because they require specialized equipment for precise measurements, as well as urban environmental specialists capable of completing a task of this complexity. These data must include geographical references in a form of a coordinate system, mapping points or blocks on a map to certain noise values in decibels. This process is usually done with Geographic Information System (GIS) software tools that try to model noise pollution [32, 33].

As far as we know, there is no such data openly available for the urban area of Thessaloniki, Greece. However, there are official studies of noise pollution for Thessaloniki orchestrated by the Hellenic Ministry of Environment and Energy.^{Footnote 5} The studies were conducted in 2015 for three major municipalities of the urban area of Thessaloniki, namely Thessaloniki, Neapoli and Kalamaria, with specialized equipment capable of measuring ground sounds levels caused mainly, but not only, by factors like vehicles (local transportation), crowds and nightlife, while additionally calculating aviation sound produced by airplanes landing to or taking off at the nearby airport. These noise sources are considered to be the primary causes of noise pollution in urban environments [34]. The duration of the studies were set to 46 consecutive days, capturing noise pollution at least once every hour or, in cases, every 15 minutes.

The final results were illustrated on a heatmap, where discrete colors represent different noise ranges of 5 decibel intervals. For each municipality, the results are segmented into daytime and nighttime noise and, in both cases, the data accumulate the sound sources by taking into account both traffic and aviation disturbances. Additionally, for Kalamaria there is a separate heatmap representing only the aviation noise.

Even though the data were gathered in 2015 they can still be relevant today for the city of Thessaloniki for two reasons. The first one is due to published studies indicating that noise pollution in Thessaloniki remains the same along the years [35]. The second one is the fact that noise outliers, such as noise coming from construction sites or extreme weather conditions, were excluded from the official heatmaps, rendering the dataset more accurate and relatively timeless in terms of the actual noise.

3.1 Idea and approach

The aforementioned studies did not make public the core measurement data that were used to create the provided heatmaps. To overcome this problem, we had to reconstruct these data with a small error. It is important to state that heatmaps used discrete colors mapped to specific small ranges of decibels as shown in Tables 1(a) and 1(b) (note that the ranges and the colors between the two tables are different). This means that each color represents the entire range without changing its tone. The ultimate goal is to be able to create the exact same maps by utilizing the reconstructed data. More specifically, the new dataset will contain the noise, in decibels, of a point given its latitude and longitude coordinates.

Table 1 The original mapping between the noise ranges and the corresponding colors (in RGB) for the area of Thessaloniki/Neapoli (left) and Kalamaria (right). Each noise range within an area is mapped to a different color, while the color mappings between the two areas differ

Does noise affect housing prices? A case study in the urban area of Thessaloniki

Abstract

1 Introduction

2 Related work

3 Noise data reconstruction

3.1 Idea and approach

4 Implementation and experimentation

4.1 Property data

4.2 Experiments

5 Results and discussion

5.1 Noise reconstruction results for Thessaloniki and Neapoli

5.2 Noise reconstruction results for Kalamaria

5.3 Experimental results

5.3.1 Area A

5.3.2 Area B

5.3.3 Area C

6 Conclusion

Availability of data and materials

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Abbreviations

Publisher’s Note

Appendices

Appendix A: Reconstructed heatmaps

Appendix B: Correlation plots

Appendix C: Result plots

Appendix D: Area statistics

Appendix E: Result tables

Appendix F: Hyperparameter configuration

6.1 F.1 Area A

6.2 F.2 Area B

6.3 F.3 Area C

Rights and permissions

About this article

Cite this article

Share this article

Keywords