Libardi Arturo de la Cruz, Masselot Pierre, Schneider Rochelle, Nightingale Emily, Milojevic Ai, Vanoli Jacopo, Mistry Malcolm N, Gasparrini Antonio
Environment & Health Modelling (EHM) Lab, Department of Public Health Environments and Society, London School of Hygiene & Tropical Medicine, 15-17 Tavistock Place, WC1H 9SH, London, United Kingdom.
Φ-lab (Phi-lab), European Space Agency (ESA), Frascati, Italy.
Atmos Pollut Res. 2024 Nov;15(11):102284. doi: 10.1016/j.apr.2024.102284. Epub 2024 Aug 9.
In this contribution, we applied a multi-stage machine learning (ML) framework to map daily values of nitrogen dioxide (NO) and particulate matter (PM and PM) at a 1 km resolution over Great Britain for the period 2003-2021. The process combined ground monitoring observations, satellite-derived products, climate reanalyses and chemical transport model datasets, and traffic and land-use data. Each feature was harmonized to 1 km resolution and extracted at monitoring sites. Models used single and ensemble-based algorithms featuring random forests (RF), extreme gradient boosting (XGB), light gradient boosting machine (LGBM), as well as lasso and ridge regression. The various stages focused on augmenting PM using co-occurring PM values, gap-filling aerosol optical depth and columnar NO data obtained from satellite instruments, and finally the training of an ensemble model and the prediction of daily values across the whole geographical domain (2003-2021). Results show a good ensemble model performance, calculated through a ten-fold monitor-based cross-validation procedure, with an average R of 0.690 (range 0.611-0.792) for NO, 0.704 (0.609-0.786) for PM, and 0.802 (0.746-0.888) for PM. Reconstructed pollution levels decreased markedly within the study period, with a stronger reduction in the latter eight years. The pollutants exhibited different spatial patterns, while NO rose in close proximity to high-traffic areas, PM demonstrated variation at a larger scale. The resulting 1 km spatially resolved daily datasets allow for linkage with health data across Great Britain over nearly two decades, thus contributing to extensive, extended, and detailed research on the long-and short-term health effects of air pollution.
在本研究中,我们应用了一个多阶段机器学习(ML)框架,以1公里分辨率绘制了2003年至2021年期间英国二氧化氮(NO)和颗粒物(PM 和 PM)的日值地图。该过程结合了地面监测观测数据、卫星衍生产品、气候再分析数据和化学传输模型数据集,以及交通和土地利用数据。每个特征都被统一到1公里分辨率,并在监测站点提取。模型使用了基于单算法和集成算法,包括随机森林(RF)、极端梯度提升(XGB)、轻梯度提升机(LGBM),以及套索回归和岭回归。各个阶段重点利用同时出现的PM值增强PM,填补卫星仪器获得的气溶胶光学深度和气态NO数据的空白,最后训练一个集成模型并预测整个地理区域(2003 - 2021年)的日值。结果显示,通过基于监测的十折交叉验证程序计算,集成模型性能良好,NO的平均R值为0.690(范围为0.611 - 0.792),PM为0.704(0.609 - 0.786),PM为0.802(0.746 - 0.888)。在研究期间,重建的污染水平显著下降,后八年下降更为明显。污染物呈现出不同的空间模式,NO在高交通流量区域附近上升,PM则在更大尺度上表现出变化。由此产生的1公里空间分辨率的每日数据集,使得近二十年来能够将英国各地的健康数据与之关联,从而有助于对空气污染的长期和短期健康影响进行广泛、深入和详细的研究。