Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.
School of Public Health, Mongolian National University of Medical Sciences, Zorig Street, Ulaanbaatar, 14210, Mongolia.
Environ Pollut. 2019 Feb;245:746-753. doi: 10.1016/j.envpol.2018.11.034. Epub 2018 Nov 16.
Indoor and outdoor fine particulate matter (PM) are both leading risk factors for death and disease, but making indoor measurements is often infeasible for large study populations.
We developed models to predict indoor PM concentrations for pregnant women who were part of a randomized controlled trial of portable air cleaners in Ulaanbaatar, Mongolia. We used multiple linear regression (MLR) and random forest regression (RFR) to model indoor PM concentrations with 447 independent 7-day PM measurements and 87 potential predictor variables obtained from outdoor monitoring data, questionnaires, home assessments, and geographic data sets. We also developed blended models that combined the MLR and RFR approaches. All models were evaluated in a 10-fold cross-validation.
The predictors in the MLR model were season, outdoor PM concentration, the number of air cleaners deployed, and the density of gers (traditional felt-lined yurts) surrounding the apartments. MLR and RFR had similar performance in cross-validation (R = 50.2%, R = 48.9% respectively). The blended MLR model that included RFR predictions had the best performance (cross validation R = 81.5%). Intervention status alone explained only 6.0% of the variation in indoor PM concentrations.
We predicted a moderate amount of variation in indoor PM concentrations using easily obtained predictor variables and the models explained substantially more variation than intervention status alone. While RFR shows promise for modelling indoor concentrations, our results highlight the importance of out-of-sample validation when evaluating model performance. We also demonstrate the improved performance of blended MLR/RFR models in predicting indoor air pollution.
室内和室外的细颗粒物(PM)都是导致死亡和疾病的主要危险因素,但对于大型研究人群来说,进行室内测量通常是不可行的。
我们为蒙古乌兰巴托正在进行的空气净化器随机对照试验中的孕妇开发了预测室内 PM 浓度的模型。我们使用多元线性回归(MLR)和随机森林回归(RFR)来建立模型,使用 447 个独立的 7 天 PM 测量值和 87 个可能的预测变量来建立模型,这些预测变量来自户外监测数据、问卷调查、家庭评估和地理数据集。我们还开发了混合模型,结合了 MLR 和 RFR 方法。所有模型都在 10 折交叉验证中进行了评估。
MLR 模型中的预测因子是季节、室外 PM 浓度、部署的空气净化器数量以及公寓周围蒙古包(传统的毡制帐篷)的密度。MLR 和 RFR 在交叉验证中的表现相似(R 分别为 50.2%和 48.9%)。包含 RFR 预测的混合 MLR 模型具有最佳的性能(交叉验证 R 为 81.5%)。干预状态本身仅解释了室内 PM 浓度变化的 6.0%。
我们使用易于获得的预测变量来预测室内 PM 浓度的中等变化量,并且模型解释了比干预状态本身更多的变化。虽然 RFR 显示出在建模室内浓度方面的潜力,但我们的结果强调了在评估模型性能时进行样本外验证的重要性。我们还证明了混合 MLR/RFR 模型在预测室内空气污染方面的改进性能。