Suppr超能文献

评估随机森林回归和多元线性回归在预测高度污染城市室内细颗粒物浓度中的应用。

Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city.

机构信息

Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.

School of Public Health, Mongolian National University of Medical Sciences, Zorig Street, Ulaanbaatar, 14210, Mongolia.

出版信息

Environ Pollut. 2019 Feb;245:746-753. doi: 10.1016/j.envpol.2018.11.034. Epub 2018 Nov 16.

Abstract

BACKGROUND

Indoor and outdoor fine particulate matter (PM) are both leading risk factors for death and disease, but making indoor measurements is often infeasible for large study populations.

METHODS

We developed models to predict indoor PM concentrations for pregnant women who were part of a randomized controlled trial of portable air cleaners in Ulaanbaatar, Mongolia. We used multiple linear regression (MLR) and random forest regression (RFR) to model indoor PM concentrations with 447 independent 7-day PM measurements and 87 potential predictor variables obtained from outdoor monitoring data, questionnaires, home assessments, and geographic data sets. We also developed blended models that combined the MLR and RFR approaches. All models were evaluated in a 10-fold cross-validation.

RESULTS

The predictors in the MLR model were season, outdoor PM concentration, the number of air cleaners deployed, and the density of gers (traditional felt-lined yurts) surrounding the apartments. MLR and RFR had similar performance in cross-validation (R = 50.2%, R = 48.9% respectively). The blended MLR model that included RFR predictions had the best performance (cross validation R = 81.5%). Intervention status alone explained only 6.0% of the variation in indoor PM concentrations.

CONCLUSIONS

We predicted a moderate amount of variation in indoor PM concentrations using easily obtained predictor variables and the models explained substantially more variation than intervention status alone. While RFR shows promise for modelling indoor concentrations, our results highlight the importance of out-of-sample validation when evaluating model performance. We also demonstrate the improved performance of blended MLR/RFR models in predicting indoor air pollution.

摘要

背景

室内和室外的细颗粒物(PM)都是导致死亡和疾病的主要危险因素,但对于大型研究人群来说,进行室内测量通常是不可行的。

方法

我们为蒙古乌兰巴托正在进行的空气净化器随机对照试验中的孕妇开发了预测室内 PM 浓度的模型。我们使用多元线性回归(MLR)和随机森林回归(RFR)来建立模型,使用 447 个独立的 7 天 PM 测量值和 87 个可能的预测变量来建立模型,这些预测变量来自户外监测数据、问卷调查、家庭评估和地理数据集。我们还开发了混合模型,结合了 MLR 和 RFR 方法。所有模型都在 10 折交叉验证中进行了评估。

结果

MLR 模型中的预测因子是季节、室外 PM 浓度、部署的空气净化器数量以及公寓周围蒙古包(传统的毡制帐篷)的密度。MLR 和 RFR 在交叉验证中的表现相似(R 分别为 50.2%和 48.9%)。包含 RFR 预测的混合 MLR 模型具有最佳的性能(交叉验证 R 为 81.5%)。干预状态本身仅解释了室内 PM 浓度变化的 6.0%。

结论

我们使用易于获得的预测变量来预测室内 PM 浓度的中等变化量,并且模型解释了比干预状态本身更多的变化。虽然 RFR 显示出在建模室内浓度方面的潜力,但我们的结果强调了在评估模型性能时进行样本外验证的重要性。我们还证明了混合 MLR/RFR 模型在预测室内空气污染方面的改进性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验