Wu Yinghan, Xu Jia, Liu Ziqi, Han Bin, Yang Wen, Bai Zhipeng
State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China.
Department of Environmental & Occupational Health Sciences, School of Public Health, University of Washington, Seattle, WA 98105, USA.
Toxics. 2024 Mar 1;12(3):197. doi: 10.3390/toxics12030197.
Various geostatistical models have been used in epidemiological research to evaluate ambient air pollutant exposures at a fine spatial scale. Few studies have investigated the performance of different exposure models on population-weighted exposure estimates and the resulting potential misclassification across various modeling approaches. This study developed spatial models for NO and PM and conducted exposure assessment in Beijing, China. It explored three spatial modeling approaches: variable dimension reduction, machine learning, and conventional linear regression. It compared their model performance by cross-validation (CV) and population-weighted exposure estimates. Specifically, partial least square (PLS) regression, random forests (RF), and supervised linear regression (SLR) models were developed based on an ordinary kriging (OK) framework for NO and PM in Beijing, China. The mean squared error-based R (R) and root mean squared error (RMSE) in leave-one site-out cross-validation (LOOCV) were used to evaluate model performance. These models were used to predict the ambient exposure levels in the urban area and to estimate the misclassification of population-weighted exposure estimates in quartiles between them. The results showed that the PLS-OK models for NO and PM, with the LOOCV R of 0.82 and 0.81, respectively, outperformed the other models. The population-weighted exposure to NO estimated by the PLS-OK and RF-OK models exhibited the lowest misclassification in quartiles. For PM, the estimates of potential misclassification were comparable across the three models. It indicated that the exposure misclassification made by choosing different modeling approaches should be carefully considered, and the resulting bias needs to be evaluated in epidemiological studies.
各种地质统计模型已被用于流行病学研究,以在精细空间尺度上评估环境空气污染物暴露情况。很少有研究调查不同暴露模型在人口加权暴露估计方面的表现以及不同建模方法所导致的潜在错误分类。本研究针对一氧化氮(NO)和颗粒物(PM)建立了空间模型,并在中国北京进行了暴露评估。研究探索了三种空间建模方法:变量降维、机器学习和传统线性回归。通过交叉验证(CV)和人口加权暴露估计比较了它们的模型性能。具体而言,基于普通克里金(OK)框架,针对中国北京的NO和PM开发了偏最小二乘(PLS)回归、随机森林(RF)和监督线性回归(SLR)模型。采用留一站点交叉验证(LOOCV)中基于均方误差的R(R)和均方根误差(RMSE)来评估模型性能。这些模型用于预测市区的环境暴露水平,并估计它们之间四分位数中人口加权暴露估计的错误分类情况。结果表明,NO和PM的PLS - OK模型在LOOCV中的R分别为0.82和0.81,优于其他模型。PLS - OK和RF - OK模型估计的人口加权NO暴露在四分位数中表现出最低的错误分类。对于PM,三种模型的潜在错误分类估计相当。这表明在流行病学研究中应仔细考虑因选择不同建模方法而导致的暴露错误分类,并且需要评估由此产生的偏差。