Wang Duolao, Zhang Wenyang, Bakhai Ameet
Department of Epidemiology and Population Health, Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK.
Stat Med. 2004 Nov 30;23(22):3451-67. doi: 10.1002/sim.1930.
Logistic regression is the standard method for assessing predictors of diseases. In logistic regression analyses, a stepwise strategy is often adopted to choose a subset of variables. Inference about the predictors is then made based on the chosen model constructed of only those variables retained in that model. This method subsequently ignores both the variables not selected by the procedure, and the uncertainty due to the variable selection procedure. This limitation may be addressed by adopting a Bayesian model averaging approach, which selects a number of all possible such models, and uses the posterior probabilities of these models to perform all inferences and predictions. This study compares the Bayesian model averaging approach with the stepwise procedures for selection of predictor variables in logistic regression using simulated data sets and the Framingham Heart Study data. The results show that in most cases Bayesian model averaging selects the correct model and out-performs stepwise approaches at predicting an event of interest.
逻辑回归是评估疾病预测因素的标准方法。在逻辑回归分析中,通常采用逐步策略来选择变量子集。然后基于仅由该模型中保留的那些变量构建的所选模型对预测因素进行推断。此方法随后会忽略未被该程序选中的变量以及由于变量选择程序导致的不确定性。可以通过采用贝叶斯模型平均方法来解决这一局限性,该方法会选择所有可能的此类模型中的若干个,并使用这些模型的后验概率来进行所有推断和预测。本研究使用模拟数据集和弗雷明汉心脏研究数据,比较了贝叶斯模型平均方法与逻辑回归中选择预测变量的逐步程序。结果表明,在大多数情况下,贝叶斯模型平均能选择正确的模型,并且在预测感兴趣事件方面优于逐步方法。