环境流行病学中的模型选择与健康效应估计。

Model selection and health effect estimation in environmental epidemiology.

作者信息

Dominici Francesca, Wang Chi, Crainiceanu Ciprian, Parmigiani Giovanni

机构信息

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.

出版信息

Epidemiology. 2008 Jul;19(4):558-60. doi: 10.1097/EDE.0b013e31817307dc.

DOI:10.1097/EDE.0b013e31817307dc

PMID:18552590

Abstract

In air pollution epidemiology, improvements in statistical analysis tools can help improve signal-to-noise ratios, and untangle large correlations between exposures and confounders. For this reason, we welcome a novel model-selection approach that helps to identify the time-windows of exposure to pollutants that produces adverse health effects. However, there are concerns about approaches that select a model based on a given data set, and then estimate health effects in the same data. This can create problems when (1) the sample size is small in relation to the magnitude of the health effects; and (2) candidate predictors are highly correlated and likely to have similar effects. Bayesian Model Averaging has been advocated as a way to estimate health effects that accounts for model uncertainty. However, implementations where posterior model probabilities are approximated using BIC, as well as other default choices, may not reflect the ability of each model to provide an estimate of the health effect that is properly adjusted for confounding. Air pollution studies need to focus on estimating health effects while accounting for the uncertainty in the adjustment for confounding factors. This is true especially when model choice and estimation are performed on the same data. The development of appropriate statistical tools remains an open area of investigation.

摘要

在空气污染流行病学中，统计分析工具的改进有助于提高信噪比，并理清暴露因素与混杂因素之间的复杂关联。因此，我们欢迎一种新颖的模型选择方法，该方法有助于识别接触污染物产生不良健康影响的时间窗。然而，对于基于给定数据集选择模型，然后在同一数据中估计健康影响的方法存在担忧。当（1）样本量相对于健康影响的程度较小时；以及（2）候选预测变量高度相关且可能具有相似影响时，这可能会产生问题。贝叶斯模型平均法已被倡导为一种考虑模型不确定性来估计健康影响的方法。然而，使用贝叶斯信息准则（BIC）近似后验模型概率的实现方式以及其他默认选择，可能无法反映每个模型提供针对混杂因素进行适当调整的健康影响估计的能力。空气污染研究需要专注于在考虑混杂因素调整的不确定性的同时估计健康影响。当模型选择和估计在同一数据上进行时尤其如此。开发合适的统计工具仍然是一个有待研究的领域。