Suppr超能文献

用于创建全国每日环境颗粒物浓度地图的统计方法与机器学习方法比较

A comparison of statistical and machine learning methods for creating national daily maps of ambient PM concentration.

作者信息

Berrocal Veronica J, Guan Yawen, Muyskens Amanda, Wang Haoyu, Reich Brian J, Mulholland James A, Chang Howard H

机构信息

University of California - Irvine, Department of Statistics, Irvine, California, USA.

University of Nebraska, Department of Statistics, Lincoln, Nebraska, USA.

出版信息

Atmos Environ (1994). 2020 Feb 1;222. doi: 10.1016/j.atmosenv.2019.117130. Epub 2019 Nov 14.

Abstract

A typical challenge in air pollution epidemiology is to perform detailed exposure assessment for individuals for which health data are available. To address this problem, in the last few years, substantial research efforts have been placed in developing statistical methods or machine learning techniques to generate estimates of air pollution at fine spatial and temporal scales (daily, usually) with complete coverage. However, it is not clear how much the predicted exposures yielded by the various methods differ, and which method generates more reliable estimates. In this paper, we aim to address this gap by evaluating a variety of exposure modeling approaches, comparing their predictive performance. Using PM in year 2011 over the continental U.S. as a case study, we generate national maps of ambient PM concentration using: (i) ordinary least squares and inverse distance weighting; (ii) kriging; (iii) statistical downscaling models, that is, spatial statistical models that use the information contained in air quality model outputs; (iv) land use regression, that is, linear regression modeling approaches that leverage the information in Geographical Information System (GIS) covariates; and (v) machine learning methods, such as neural networks, random forests and support vector regression. We examine the various methods' predictive performance via cross-validation using Root Mean Squared Error, Mean Absolute Deviation, Pearson correlation, and Mean Spatial Pearson Correlation. Additionally, we evaluated whether factors such as, season, urbanicty, and levels of PM concentration (low, medium or high) affected the performance of the different methods. Overall, statistical methods that explicitly modeled the spatial correlation, e.g. universal kriging and the downscaler model, outperform all the other exposure assessment approaches regardless of season, urbanicity and PM concentration level. We posit that the better predictive performance of spatial statistical models over machine learning methods is due to the fact that they explicitly account for spatial dependence, thus borrowing information from neighboring observations. In light of our findings, we suggest that future exposure assessment methods for regional PM2.5 incorporate information from neighboring sites when deriving predictions at unsampled locations or attempt to account for spatial dependence.

摘要

空气污染流行病学中的一个典型挑战是,针对那些已有健康数据的个体进行详细的暴露评估。为解决这一问题,在过去几年中,人们投入了大量研究精力来开发统计方法或机器学习技术,以在精细的空间和时间尺度(通常为每日)上生成具有完整覆盖范围的空气污染估计值。然而,目前尚不清楚各种方法所产生的预测暴露量之间存在多大差异,以及哪种方法能生成更可靠的估计值。在本文中,我们旨在通过评估各种暴露建模方法、比较它们的预测性能来填补这一空白。以2011年美国大陆地区的细颗粒物(PM)为例,我们使用以下方法生成全国环境PM浓度地图:(i)普通最小二乘法和反距离加权法;(ii)克里金法;(iii)统计降尺度模型,即利用空气质量模型输出中所含信息的空间统计模型;(iv)土地利用回归法,即利用地理信息系统(GIS)协变量中的信息的线性回归建模方法;以及(v)机器学习方法,如神经网络、随机森林和支持向量回归。我们通过使用均方根误差、平均绝对偏差、皮尔逊相关系数和平均空间皮尔逊相关系数进行交叉验证,来检验各种方法的预测性能。此外,我们还评估了季节、城市化程度和PM浓度水平(低、中或高)等因素是否会影响不同方法的性能。总体而言,明确对空间相关性进行建模的统计方法,如通用克里金法和降尺度模型,无论在季节、城市化程度和PM浓度水平如何的情况下,其表现均优于所有其他暴露评估方法。我们认为,空间统计模型比机器学习方法具有更好的预测性能,是因为它们明确考虑了空间依赖性,从而能够从相邻观测值中获取信息。基于我们的研究结果,我们建议未来区域PM2.5暴露评估方法在未采样地点进行预测时,应纳入来自相邻站点的信息,或尝试考虑空间依赖性。

相似文献

引用本文的文献

4
Prediction and model evaluation for space-time data.时空数据的预测与模型评估。
J Appl Stat. 2023 Sep 3;51(10):2007-2024. doi: 10.1080/02664763.2023.2252208. eCollection 2024.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验