Department of Preventive Medicine, University of Southern California, Los Angeles, CA, 90032, United States.
Department of Environmental Monitoring, National Agency for Meteorology and Environmental Monitoring, Ulaanbaatar, Mongolia.
J Expo Sci Environ Epidemiol. 2021 Jul;31(4):699-708. doi: 10.1038/s41370-020-0257-8. Epub 2020 Aug 3.
Accurately assessing individual ambient air pollution exposure is a crucial part of epidemiological studies looking at the adverse health effect of poor air quality. This is particularly challenging in developing countries with high levels of air pollution, mostly due to sparse monitoring networks with a lack of consistent data.
We evaluated the performance of six different machine learning algorithms in predicting fine particulate matter (PM) concentrations in Ulaanbaatar, Mongolia using data between 2010 and 2018. We found that the algorithms produce robust results based on performance metrics.
Random forest (RF) and gradient boosting models performed the best with leave-one-location-out cross-validated R of 0.82 for when using data from the entire study period. After applying tuned models on the hold-out test set, R increased to 0.96 for the RF and 0.90 for the gradient boosting model. We also predicted PM concentrations for each administrative area (khoroo) of the city using RF and maps of predictions show spatiotemporal variations that are in line with the location of the high-emission area (ger district), city center, and population density.
Our results provide evidence of the advantage and feasibility of machine learning approaches in predicting ambient PM levels in a setting with limited resources and extreme air pollution levels.
准确评估个体的环境空气污染暴露情况是研究空气质量差对健康不良影响的流行病学研究的关键部分。这在空气污染水平高的发展中国家尤其具有挑战性,主要是由于监测网络稀疏,缺乏一致的数据。
我们使用 2010 年至 2018 年的数据,评估了六种不同机器学习算法在预测蒙古乌兰巴托细颗粒物(PM)浓度方面的性能。我们发现,这些算法基于性能指标产生了稳健的结果。
随机森林(RF)和梯度提升模型的表现最好,在使用整个研究期间的数据时,留一位置交叉验证的 R 值为 0.82。在对保留测试集应用调整后的模型后,RF 的 R 值增加到 0.96,梯度提升模型的 R 值增加到 0.90。我们还使用 RF 预测了城市的每个行政区(khoroo)的 PM 浓度,预测图显示了与高排放区(ger 区)、市中心和人口密度位置一致的时空变化。
我们的结果提供了证据,证明在资源有限和空气污染水平极高的情况下,机器学习方法在预测环境 PM 水平方面具有优势和可行性。