Berrisford Liam Jordan, Barbosa Hugo, Menezes Ronaldo
Department of Mathematics, University of Exeter, Exeter, UK.
UKRI Centre for Doctoral Training in Environmental Intelligence, University of Exeter, Exeter, UK.
R Soc Open Sci. 2025 Jul 23;12(7):241288. doi: 10.1098/rsos.241288. eCollection 2025 Jul.
Global ambient air pollution, a transboundary challenge, is typically addressed through interventions relying on data from spatially sparse and heterogeneously placed monitoring stations. These stations often encounter temporal data gaps due to issues such as power outages. In response, we have developed a scalable, data-driven, supervised machine learning framework. The models produced by the framework are designed to impute missing temporal and spatial measurements, thereby generating a comprehensive dataset for air pollutants including NO, O, PM, PM. and SO. In this work, we produce models providing concentration estimations at 261 377 locations across the globe. The dataset, with a fine granularity of 0.25° spatial resolution at hourly time intervals and accompanied by prediction intervals for each estimate, caters to a wide range of stakeholders relying on outdoor air pollution data for downstream assessments. This enables more detailed studies. Additionally, the model's performance across various geographical locations is examined, providing insights and recommendations for strategic placement of future monitoring stations to further enhance the model's accuracy.
全球环境空气污染是一个跨界挑战,通常通过依赖空间上稀疏且分布不均的监测站数据的干预措施来应对。这些监测站由于停电等问题经常出现时间数据缺口。作为回应,我们开发了一个可扩展的、数据驱动的监督式机器学习框架。该框架生成的模型旨在估算缺失的时间和空间测量值,从而生成一个包含一氧化氮、臭氧、颗粒物(PM)、细颗粒物(PM₂.₅)和二氧化硫等空气污染物的综合数据集。在这项工作中,我们生成了能在全球261377个地点提供浓度估计值的模型。该数据集具有每小时时间间隔0.25°空间分辨率的精细粒度,并为每个估计值提供预测区间,满足了广泛依赖室外空气污染数据进行下游评估的利益相关者的需求。这使得能够进行更详细的研究。此外,还检查了该模型在不同地理位置的性能,为未来监测站的战略布局提供见解和建议,以进一步提高模型的准确性。