• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于创建全国每日环境颗粒物浓度地图的统计方法与机器学习方法比较

A comparison of statistical and machine learning methods for creating national daily maps of ambient PM concentration.

作者信息

Berrocal Veronica J, Guan Yawen, Muyskens Amanda, Wang Haoyu, Reich Brian J, Mulholland James A, Chang Howard H

机构信息

University of California - Irvine, Department of Statistics, Irvine, California, USA.

University of Nebraska, Department of Statistics, Lincoln, Nebraska, USA.

出版信息

Atmos Environ (1994). 2020 Feb 1;222. doi: 10.1016/j.atmosenv.2019.117130. Epub 2019 Nov 14.

DOI:10.1016/j.atmosenv.2019.117130
PMID:32863727
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7451200/
Abstract

A typical challenge in air pollution epidemiology is to perform detailed exposure assessment for individuals for which health data are available. To address this problem, in the last few years, substantial research efforts have been placed in developing statistical methods or machine learning techniques to generate estimates of air pollution at fine spatial and temporal scales (daily, usually) with complete coverage. However, it is not clear how much the predicted exposures yielded by the various methods differ, and which method generates more reliable estimates. In this paper, we aim to address this gap by evaluating a variety of exposure modeling approaches, comparing their predictive performance. Using PM in year 2011 over the continental U.S. as a case study, we generate national maps of ambient PM concentration using: (i) ordinary least squares and inverse distance weighting; (ii) kriging; (iii) statistical downscaling models, that is, spatial statistical models that use the information contained in air quality model outputs; (iv) land use regression, that is, linear regression modeling approaches that leverage the information in Geographical Information System (GIS) covariates; and (v) machine learning methods, such as neural networks, random forests and support vector regression. We examine the various methods' predictive performance via cross-validation using Root Mean Squared Error, Mean Absolute Deviation, Pearson correlation, and Mean Spatial Pearson Correlation. Additionally, we evaluated whether factors such as, season, urbanicty, and levels of PM concentration (low, medium or high) affected the performance of the different methods. Overall, statistical methods that explicitly modeled the spatial correlation, e.g. universal kriging and the downscaler model, outperform all the other exposure assessment approaches regardless of season, urbanicity and PM concentration level. We posit that the better predictive performance of spatial statistical models over machine learning methods is due to the fact that they explicitly account for spatial dependence, thus borrowing information from neighboring observations. In light of our findings, we suggest that future exposure assessment methods for regional PM2.5 incorporate information from neighboring sites when deriving predictions at unsampled locations or attempt to account for spatial dependence.

摘要

空气污染流行病学中的一个典型挑战是,针对那些已有健康数据的个体进行详细的暴露评估。为解决这一问题,在过去几年中,人们投入了大量研究精力来开发统计方法或机器学习技术,以在精细的空间和时间尺度(通常为每日)上生成具有完整覆盖范围的空气污染估计值。然而,目前尚不清楚各种方法所产生的预测暴露量之间存在多大差异,以及哪种方法能生成更可靠的估计值。在本文中,我们旨在通过评估各种暴露建模方法、比较它们的预测性能来填补这一空白。以2011年美国大陆地区的细颗粒物(PM)为例,我们使用以下方法生成全国环境PM浓度地图:(i)普通最小二乘法和反距离加权法;(ii)克里金法;(iii)统计降尺度模型,即利用空气质量模型输出中所含信息的空间统计模型;(iv)土地利用回归法,即利用地理信息系统(GIS)协变量中的信息的线性回归建模方法;以及(v)机器学习方法,如神经网络、随机森林和支持向量回归。我们通过使用均方根误差、平均绝对偏差、皮尔逊相关系数和平均空间皮尔逊相关系数进行交叉验证,来检验各种方法的预测性能。此外,我们还评估了季节、城市化程度和PM浓度水平(低、中或高)等因素是否会影响不同方法的性能。总体而言,明确对空间相关性进行建模的统计方法,如通用克里金法和降尺度模型,无论在季节、城市化程度和PM浓度水平如何的情况下,其表现均优于所有其他暴露评估方法。我们认为,空间统计模型比机器学习方法具有更好的预测性能,是因为它们明确考虑了空间依赖性,从而能够从相邻观测值中获取信息。基于我们的研究结果,我们建议未来区域PM2.5暴露评估方法在未采样地点进行预测时,应纳入来自相邻站点的信息,或尝试考虑空间依赖性。

相似文献

1
A comparison of statistical and machine learning methods for creating national daily maps of ambient PM concentration.用于创建全国每日环境颗粒物浓度地图的统计方法与机器学习方法比较
Atmos Environ (1994). 2020 Feb 1;222. doi: 10.1016/j.atmosenv.2019.117130. Epub 2019 Nov 14.
2
Assessment and statistical modeling of the relationship between remotely sensed aerosol optical depth and PM2.5 in the eastern United States.美国东部地区遥感气溶胶光学厚度与PM2.5之间关系的评估及统计建模
Res Rep Health Eff Inst. 2012 May(167):5-83; discussion 85-91.
3
Enhancing Models and Measurements of Traffic-Related Air Pollutants for Health Studies Using Dispersion Modeling and Bayesian Data Fusion.利用扩散模型和贝叶斯数据融合技术改进交通相关空气污染物的模型和测量方法,以用于健康研究。
Res Rep Health Eff Inst. 2020 Mar;2020(202):1-63.
4
Mortality and Morbidity Effects of Long-Term Exposure to Low-Level PM, BC, NO, and O: An Analysis of European Cohorts in the ELAPSE Project.长期暴露于低水平 PM、BC、NO 和 O 对死亡率和发病率的影响:ELAPSE 项目中欧洲队列的分析。
Res Rep Health Eff Inst. 2021 Sep;2021(208):1-127.
5
Evaluating heterogeneity in indoor and outdoor air pollution using land-use regression and constrained factor analysis.利用土地利用回归和约束因子分析评估室内和室外空气污染的异质性。
Res Rep Health Eff Inst. 2010 Dec(152):5-80; discussion 81-91.
6
A land use regression model using machine learning and locally developed low cost particulate matter sensors in Uganda.乌干达使用机器学习和本地开发的低成本颗粒物传感器的土地利用回归模型。
Environ Res. 2021 Aug;199:111352. doi: 10.1016/j.envres.2021.111352. Epub 2021 May 24.
7
Extended follow-up and spatial analysis of the American Cancer Society study linking particulate air pollution and mortality.美国癌症协会关于空气污染颗粒与死亡率关系研究的长期随访及空间分析
Res Rep Health Eff Inst. 2009 May(140):5-114; discussion 115-36.
8
Mortality-Air Pollution Associations in Low Exposure Environments (MAPLE): Phase 2.低暴露环境下死亡率与空气污染关联研究(MAPLE):第二阶段。
Res Rep Health Eff Inst. 2022 Jul;2022(212):1-91.
9
Estimating PM concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China.利用机器学习 GA-SVM 方法估算 PM 浓度,以改进中国陕西的土地利用回归模型。
Ecotoxicol Environ Saf. 2021 Dec 1;225:112772. doi: 10.1016/j.ecoenv.2021.112772. Epub 2021 Sep 13.
10
A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide.比较线性回归、正则化和机器学习算法,以建立欧洲范围内细颗粒物和二氧化氮的空间模型。
Environ Int. 2019 Sep;130:104934. doi: 10.1016/j.envint.2019.104934. Epub 2019 Jun 20.

引用本文的文献

1
PM concentration prediction using machine learning algorithms: an approach to virtual monitoring stations.使用机器学习算法进行颗粒物浓度预测:一种虚拟监测站的方法
Sci Rep. 2025 Mar 8;15(1):8076. doi: 10.1038/s41598-025-92019-3.
2
Assessing predictability of environmental time series with statistical and machine learning models.使用统计和机器学习模型评估环境时间序列的可预测性。
Environmetrics. 2025 Jan;36(1). doi: 10.1002/env.2864. Epub 2024 Jul 5.
3
Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models.

本文引用的文献

1
Data Integration for the Assessment of Population Exposure to Ambient Air Pollution for Global Burden of Disease Assessment.用于评估人群暴露于环境空气污染的全球疾病负担评估的数据集成。
Environ Sci Technol. 2018 Aug 21;52(16):9069-9078. doi: 10.1021/acs.est.8b02864. Epub 2018 Jul 30.
2
Association of Short-term Exposure to Air Pollution With Mortality in Older Adults.老年人短期暴露于空气污染与死亡率的关联。
JAMA. 2017 Dec 26;318(24):2446-2456. doi: 10.1001/jama.2017.17923.
3
Urban air quality forecasting based on multi-dimensional collaborative Support Vector Regression (SVR): A case study of Beijing-Tianjin-Shijiazhuang.
套索正则化技术在减轻空气质量预测模型过拟合中的应用。
Sci Rep. 2025 Jan 2;15(1):547. doi: 10.1038/s41598-024-84342-y.
4
Prediction and model evaluation for space-time data.时空数据的预测与模型评估。
J Appl Stat. 2023 Sep 3;51(10):2007-2024. doi: 10.1080/02664763.2023.2252208. eCollection 2024.
5
Parsimonious Random-Forest-Based Land-Use Regression Model Using Particulate Matter Sensors in Berlin, Germany.在德国柏林使用颗粒物传感器的基于简约随机森林的土地利用回归模型
Sensors (Basel). 2024 Jun 27;24(13):4193. doi: 10.3390/s24134193.
6
Environmental Injustice, Tree Canopy Cover, and Academic Proficiency at Utah Public Primary Schools.犹他州公立小学的环境不公正、树冠覆盖率与学业水平
Environ Justice. 2024 Feb;17(1):15-30. doi: 10.1089/env.2021.0113. Epub 2022 Dec 23.
7
Multivariate Spatial Prediction of Air Pollutant Concentrations with INLA.使用集成嵌套拉普拉斯近似法对空气污染物浓度进行多变量空间预测。
Environ Res Commun. 2021 Oct;3(10). doi: 10.1088/2515-7620/ac2f92. Epub 2021 Oct 27.
8
Flexible Bayesian Ensemble Machine Learning Framework for Predicting Local Ozone Concentrations.灵活贝叶斯集成机器学习框架,用于预测局部臭氧浓度。
Environ Sci Technol. 2022 Apr 5;56(7):3871-3883. doi: 10.1021/acs.est.1c04076. Epub 2022 Mar 21.
9
Short-term PM and cardiovascular admissions in NY State: assessing sensitivity to exposure model choice.纽约州短期 PM 暴露与心血管疾病入院率:评估暴露模型选择的敏感性。
Environ Health. 2021 Aug 23;20(1):93. doi: 10.1186/s12940-021-00782-3.
10
SPATIAL DISTRIBUTED LAG DATA FUSION FOR ESTIMATING AMBIENT AIR POLLUTION.用于估计环境空气污染的空间分布式滞后数据融合
Ann Appl Stat. 2021 Mar;15(1):323-342. doi: 10.1214/20-aoas1399. Epub 2021 Mar 18.
基于多维协同支持向量回归(SVR)的城市空气质量预测——以京津冀地区为例
PLoS One. 2017 Jul 14;12(7):e0179763. doi: 10.1371/journal.pone.0179763. eCollection 2017.
4
Estimating PM Concentrations in the Conterminous United States Using the Random Forest Approach.采用随机森林方法估算美国本土的 PM 浓度。
Environ Sci Technol. 2017 Jun 20;51(12):6936-6944. doi: 10.1021/acs.est.7b01210. Epub 2017 Jun 1.
5
Non-stationary spatio-temporal modeling of traffic-related pollutants in near-road environments.近道路环境中与交通相关污染物的非平稳时空建模。
Spat Spatiotemporal Epidemiol. 2016 Aug;18:24-37. doi: 10.1016/j.sste.2016.03.003. Epub 2016 Apr 28.
6
Combining Land-Use Regression and Chemical Transport Modeling in a Spatiotemporal Geostatistical Model for Ozone and PM2.5.在用于臭氧和PM2.5的时空地质统计模型中结合土地利用回归与化学传输模型
Environ Sci Technol. 2016 May 17;50(10):5111-8. doi: 10.1021/acs.est.5b06001. Epub 2016 Apr 26.
7
Assessing PM2.5 Exposures with High Spatiotemporal Resolution across the Continental United States.在美国大陆以高时空分辨率评估细颗粒物(PM2.5)暴露情况。
Environ Sci Technol. 2016 May 3;50(9):4712-21. doi: 10.1021/acs.est.5b06121. Epub 2016 Apr 22.
8
Air Pollution and Preterm Birth in the U.S. State of Georgia (2002-2006): Associations with Concentrations of 11 Ambient Air Pollutants Estimated by Combining Community Multiscale Air Quality Model (CMAQ) Simulations with Stationary Monitor Measurements.美国佐治亚州的空气污染与早产(2002 - 2006年):结合社区多尺度空气质量模型(CMAQ)模拟与固定监测站测量估算的11种环境空气污染物浓度的关联
Environ Health Perspect. 2016 Jun;124(6):875-80. doi: 10.1289/ehp.1409651. Epub 2015 Oct 20.
9
Pediatric Emergency Visits and Short-Term Changes in PM2.5 Concentrations in the U.S. State of Georgia.美国佐治亚州的儿科急诊就诊情况与PM2.5浓度的短期变化
Environ Health Perspect. 2016 May;124(5):690-6. doi: 10.1289/ehp.1509856. Epub 2015 Oct 9.
10
Spatiotemporal prediction of fine particulate matter during the 2008 northern California wildfires using machine learning.利用机器学习对 2008 年北加州野火期间细颗粒物进行时空预测。
Environ Sci Technol. 2015 Mar 17;49(6):3887-96. doi: 10.1021/es505846r. Epub 2015 Feb 27.