Harvard University, Department of Environmental Health, School of Public Health, Boston, MA, United States.
Harvard University, Department of Biostatistics, School of Public Health, Boston, MA, United States.
Environ Res. 2019 Aug;175:421-433. doi: 10.1016/j.envres.2019.05.025. Epub 2019 May 28.
Numerous modeling approaches to estimate concentrations of PM components have been developed to derive better exposures for health studies, including geostatistical interpolation approaches, land use regression models and, models based on remote sensing technology. Recently, there have been some efforts to develop models based on machine learning algorithms. Each one of these exposure assessment methods has inherent uncertainties resulting in varying levels of exposure misclassification. To date, only a few studies have attempted to systematically compare exposure estimates from different PM constituent models. Our research addresses this gap, by comparing the predictive capabilities of ordinary geostatistical interpolation (Ordinary Kriging - OK), hybrid interpolation (combination of Empirical Bayesian Kriging and land use regression), and machine learning techniques (forest-based regression) for estimating PM constituents in Eastern Massachusetts in the United States. We compared the estimates of 10 ambient PM components, which included Al, Cu, Fe, K, Ni, Pb, S, Ti, V, and Zn. The OK model performed poorest for all PM components, with an R under 0.30. The hybrid model presented a slight improvement, especially for Cu and Fe, for which the R value increased to 0.62 and 0.59, respectively. These elements presented the highest R value from the hybrid model. The forest model presented the best performance, with R values higher than 0.7 for most of the particle components, including Cu, Fe, Ni, Pb, Ti, and V. Same as observed with the hybrid model, the forest model for Cu and Fe explained the highest concentration variance, with a R value equal to 0.88 and 0.92, respectively. The forest model for K, S, and Zn performed poorest with an R value of 0.54, 0.37, and 0.44, respectively. The results presented here can be useful for the environmental health community to more accurately estimate PM constituents over space.
已经开发了许多模型方法来估计 PM 成分的浓度,以更好地为健康研究提供暴露情况,包括地质统计学插值方法、土地利用回归模型和基于遥感技术的模型。最近,已经有一些努力开发基于机器学习算法的模型。这些暴露评估方法中的每一种都存在固有不确定性,导致暴露情况的分类存在不同程度的偏差。迄今为止,只有少数研究试图系统地比较不同 PM 成分模型的暴露估计值。我们的研究通过比较普通地质统计学插值(普通克里金法 - OK)、混合插值(经验贝叶斯克里金法和土地利用回归的组合)和机器学习技术(基于森林的回归)在美国马萨诸塞州东部估计 PM 成分的预测能力来解决这一差距。我们比较了 10 种环境 PM 成分的估计值,其中包括 Al、Cu、Fe、K、Ni、Pb、S、Ti、V 和 Zn。对于所有 PM 成分,OK 模型的表现最差,R 值均低于 0.30。混合模型略有改善,特别是对于 Cu 和 Fe,R 值分别增加到 0.62 和 0.59。这些元素在混合模型中呈现出最高的 R 值。森林模型的表现最好,大多数颗粒成分的 R 值均高于 0.7,包括 Cu、Fe、Ni、Pb、Ti 和 V。与混合模型一样,Cu 和 Fe 的森林模型解释了最高的浓度方差,R 值分别为 0.88 和 0.92。森林模型对于 K、S 和 Zn 的表现最差,R 值分别为 0.54、0.37 和 0.44。这里呈现的结果可用于环境健康社区更准确地估计空间上的 PM 成分。