Makhdoomi Ahmad, Sarkhosh Maryam, Ziaei Somayyeh
Department of Environmental Health Engineering, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
Sci Rep. 2025 Mar 8;15(1):8076. doi: 10.1038/s41598-025-92019-3.
One of the most important pollutants is PM, which is particularly important to monitor pollutant levels to keep the pollutant concentration under control. In this research, an attempt has been made to predict the concentrations of PM using four Machine Learning (ML) models. The ML methods include Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting Regressor (XGBR), Random Forest (RF) and Gradient Boosting Regressor (GBR). The mean and maximum concentration of PM were recorded 32.84 µg/m and 160.25 µg/m, respectively, indicating the occurrence of occasional episodes of high pollution levels from 2016 to 2022. The PM2.5 concentrations dropped below 30 µg/m in 2018 due to reduced human activities during COVID-19 lockdowns but PM levels were significantly increased because of the ongoing operation of heavy industries post-COVID-19 lockdowns during 2021. The ML models performed very well in predicting the concentrations of PM with around 95% of their predictions falling within the factor of the observed concentration. The results presented that among the four ML algorithms, GBR confirmed good model performance compared to the other models, with the lowest MSE (5.33) and RMSE (2.31), as well as high accuracy measures. This suggests that GBR is the best model for reducing large errors, making it more robust in capturing variations in PM2.5 levels. In conclusion, the study proposed a method to obtain high-accuracy PM prediction results using ML which are useful for air quality monitoring on a global scale and improving acute exposure assessment in epidemiological research.
最重要的污染物之一是颗粒物(PM),监测污染物水平以控制污染物浓度尤为重要。在本研究中,已尝试使用四种机器学习(ML)模型预测PM的浓度。这些ML方法包括轻梯度提升机(LGBM)、极端梯度提升回归器(XGBR)、随机森林(RF)和梯度提升回归器(GBR)。PM的平均浓度和最大浓度分别记录为32.84微克/立方米和160.25微克/立方米,这表明2016年至2022年期间偶尔会出现高污染水平事件。由于在新冠疫情封锁期间人类活动减少,2018年PM2.5浓度降至30微克/立方米以下,但在2021年新冠疫情封锁后,由于重工业的持续运营,PM水平显著上升。这些ML模型在预测PM浓度方面表现非常出色,约95%的预测值落在观测浓度的系数范围内。结果表明,在这四种ML算法中,与其他模型相比,GBR具有良好的模型性能,均方误差(MSE)最低(5.33),均方根误差(RMSE)最低(2.31),且具有较高的准确度指标。这表明GBR是减少大误差的最佳模型,使其在捕捉PM2.5水平变化方面更稳健。总之,该研究提出了一种使用ML获得高精度PM预测结果的方法,这对于全球范围内的空气质量监测以及改善流行病学研究中的急性暴露评估非常有用。