Estimation of High-Spatial Resolution of Ground-Level Ozone, Nitrogen Dioxide, and Carbon Monoxide in South Korea During 2002-2020 Using Machine-Learning Based Ensemble Model
38 Pages Posted: 18 Nov 2022
Abstract
Long-term exposure to ozone (O3), nitrogen dioxide (NO2), and carbon monoxide (CO) is known to cause various diseases and increase mortality. For that reason, estimating ground-level O3, NO2, and CO concentrations with a high spatial resolution is crucial for assessing the health effects associated with these air pollutants. However, related studies are limited in South Korea. This study aimed to develop machine learning-based models to predict the monthly O3 (average of daily 8-hour maximums), NO2, and CO at a spatial resolution of 1 km × 1 km across South Korea from 2002 to 2020. Approximately 80% of the monitoring stations were used to train the three machine learning models (random forest, light gradient boosting, and neural network) with a 10-fold cross-validation, and 20% of the monitoring stations were used to test the model performance. We also applied ensemble models to integrate the variation in predictions among the models. Multiple predictors with satellite-based remote sensing data, inverse distance weighted ground-level air pollutants, land use variables, reanalysis datasets for meteorological variables, and regional socioeconmoic variables collected from various databases were included in the prediction model. For O3, the overall R2 of the ensemble model was 0.841 during the entire study period. Urban areas showed a better model performance (R2 = 0.845) than rural areas (R2 = 0.762). For NO2, the highest overall R2 was 0.756, which best fit in autumn (R2 = 0.768). For CO, the overall R2 value was 0.506. This study provides high spatial resolution monthly average O3 and NO2 estimates with excellent performance (R2 > 0.75). Our predictions can be used to analyze the spatial patterns in pollutants in relation to population characteristics and studies on the health effects of long-term exposure to air pollution using geocode-based health information and local health data.
Keywords: Gaseous air pollutant, Exposure assessment, High spatial resolution, Machine learning model, Ensemble model
Suggested Citation: Suggested Citation