Atlanta Veterans Affairs Medical Center, Decatur, Georgia, United States of America.
Infectious Disease Clinical Research Program, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America.
PLoS One. 2014 Jan 29;9(1):e87352. doi: 10.1371/journal.pone.0087352. eCollection 2014.
Variable selection is an important step in building a multivariate regression model for which several methods and statistical packages are available. A comprehensive approach for variable selection in complex multivariate regression analyses within HIV cohorts is explored by utilizing both epidemiological and biostatistical procedures.
Three different methods for variable selection were illustrated in a study comparing survival time between subjects in the Department of Defense's National History Study and the Atlanta Veterans Affairs Medical Center's HIV Atlanta VA Cohort Study. The first two methods were stepwise selection procedures, based either on significance tests (Score test), or on information theory (Akaike Information Criterion), while the third method employed a Bayesian argument (Bayesian Model Averaging).
All three methods resulted in a similar parsimonious survival model. Three of the covariates previously used in the multivariate model were not included in the final model suggested by the three approaches. When comparing the parsimonious model to the previously published model, there was evidence of less variance in the main survival estimates.
The variable selection approaches considered in this study allowed building a model based on significance tests, on an information criterion, and on averaging models using their posterior probabilities. A parsimonious model that balanced these three approaches was found to provide a better fit than the previously reported model.
变量选择是构建多元回归模型的重要步骤,有多种方法和统计软件包可供选择。本研究通过利用流行病学和生物统计学程序,探讨了在 HIV 队列中进行复杂多元回归分析时的变量选择综合方法。
在一项比较国防部国家历史研究和亚特兰大退伍军人事务医疗中心 HIV 亚特兰大 VA 队列研究中受试者生存时间的研究中,展示了三种不同的变量选择方法。前两种方法是基于显著性检验(Score 检验)或信息理论(Akaike 信息准则)的逐步选择过程,而第三种方法则采用了贝叶斯论证(贝叶斯模型平均)。
所有三种方法都得到了一个类似的简约生存模型。在多变量模型中使用的三个协变量中有三个没有包含在三种方法建议的最终模型中。当将简约模型与之前发表的模型进行比较时,主要生存估计的方差较小。
本研究中考虑的变量选择方法允许根据显著性检验、信息准则和使用后验概率对模型进行平均,构建一个基于这些方法的模型。一个平衡了这三种方法的简约模型被发现比之前报告的模型具有更好的拟合度。