School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, 430079, China.
School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, 430079, China; State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China.
Chemosphere. 2020 Oct;256:127051. doi: 10.1016/j.chemosphere.2020.127051. Epub 2020 May 12.
Accurate estimation of surface PM concentration is critical for the assessment of PM exposure and associated health impacts. Due to the limited spatial coverage of ground monitoring stations, most studies often use the satellite products to estimate surface PM concentration by constructing a comprehensive relationship between satellite-retrieved aerosol optical depth (AOD) and ground-based measured PM concentration with machine learning (ML) technologies. However, uncertainties of ML-based models may lead to considerable biases in PM estimation, which need carefully examined. Here we evaluate the accuracy of estimated PM concentration from two popular ML-models (i.e., Random Forest and the BP Neural Network) which were trained and tested using hourly data of satellite-retrieved AOD from HIMAWARI, ground-based measured PM from China National Environmental Monitoring Center, ERA5 meteorological conditions, and other auxiliary variables for a whole year of 2017 over China. We propose a new validation method considering the spatial pattern of the data during the validation. The results suggest that the traditional validation methods may overestimate the performance of the models on estimating the PM at the area with sparse in-situ measurements. Moreover, the spatial distribution pattern of the training data will largely affect the evaluation of models performance, which should be carefully considered. For future study, at least a site-specifically validation is needed rather than only using random sampling validation.
准确估计地表 PM 浓度对于评估 PM 暴露和相关健康影响至关重要。由于地面监测站的空间覆盖范围有限,大多数研究通常使用卫星产品通过构建卫星反演气溶胶光学深度 (AOD) 与地面实测 PM 浓度之间的综合关系,并结合机器学习 (ML) 技术来估算地表 PM 浓度。然而,基于 ML 的模型的不确定性可能导致 PM 估计存在相当大的偏差,需要仔细检查。在这里,我们评估了两种流行的 ML 模型(即随机森林和 BP 神经网络)估计 PM 浓度的准确性,这些模型使用了来自 HIMAWARI 的卫星反演 AOD 的每小时数据、中国国家环境监测中心的地面实测 PM、ERA5 气象条件和其他辅助变量进行了训练和测试,整个 2017 年在中国进行了一年。我们提出了一种新的验证方法,该方法考虑了验证期间数据的空间模式。结果表明,传统的验证方法可能高估了模型在估算现场测量稀疏地区 PM 浓度方面的性能。此外,训练数据的空间分布模式将极大地影响模型性能的评估,应谨慎考虑。对于未来的研究,至少需要进行特定地点的验证,而不仅仅是使用随机抽样验证。