Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia.
Department of Epidemiology and Biostatistics, School of Public Health, The University of Queensland, Brisbane, Australia.
Sci Total Environ. 2018 Sep 15;636:52-60. doi: 10.1016/j.scitotenv.2018.04.251. Epub 2018 Apr 25.
Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level.
To estimate daily concentrations of PM across China during 2005-2016.
Daily ground-level PM data were obtained from 1479 stations across China during 2014-2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM across China with a resolution of 0.1° (≈10 km) during 2005-2016.
The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM [10-fold cross-validation (CV) R = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m]. At the monthly and annual time-scale, the explained variability of average PM increased up to 86% (RMSE = 10.7 μg/m and 6.9 μg/m, respectively).
Taking advantage of a novel application of modeling framework and the most recent ground-level PM observations, the machine learning method showed higher predictive ability than previous studies.
Random forests approach can be used to estimate historical exposure to PM in China with high accuracy.
机器学习算法具有非常高的预测能力。然而,目前还没有研究使用机器学习算法在全国范围内估算每日 PM(空气动力学直径≤2.5μm 的颗粒物)的历史浓度。
估算 2005-2016 年期间中国各地每日的 PM 浓度。
获取了 2014-2016 年期间中国 1479 个站点的每日地面 PM 数据。下载了气溶胶光学深度(AOD)、气象条件和其他预测因子的数据。建立了随机森林模型(非参数机器学习算法)和两个传统回归模型,以估算地面 PM 浓度。然后,利用最佳拟合模型估算了 2005-2016 年期间分辨率为 0.1°(≈10km)的中国各地每日 PM 浓度。
每日随机森林模型的预测精度明显高于其他两个传统回归模型,解释了每日 PM 的大部分空间变异性[10 倍交叉验证(CV)R=83%,均方根预测误差(RMSE)=28.1μg/m]。在月和年时间尺度上,平均 PM 的解释变异性增加到 86%(RMSE=10.7μg/m 和 6.9μg/m)。
利用建模框架的新应用和最新的地面 PM 观测结果,机器学习方法的预测能力高于以往的研究。
随机森林方法可以用来准确估算中国的历史 PM 暴露水平。