Suppr超能文献

具有缺失协变量的线性回归模型中的插补和变量选择

Imputation and variable selection in linear regression models with missing covariates.

作者信息

Yang Xiaowei, Belin Thomas R, Boscardin W John

机构信息

Department of Biostatistics, University of California, 11075 Santa Monica Boulevard, Suite 200, Los Angeles, California 90095-1772, USA.

出版信息

Biometrics. 2005 Jun;61(2):498-506. doi: 10.1111/j.1541-0420.2005.00317.x.

Abstract

Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.

摘要

在多个插补数据集上,诸如逐步回归等变量选择方法以及其他基于准则的策略(包括纳入或排除特定变量)通常会导致模型具有不同的选定预测变量,从而给合并单独的完整数据分析结果带来问题。在此,基于贝叶斯框架,我们提出两种替代策略,以解决存在协变量缺失时线性回归模型选择的问题。一种方法,我们称之为“先插补,后选择”(ITS),它首先进行多次插补,然后将贝叶斯变量选择应用于多个插补数据集。第二种策略是在一个吉布斯抽样过程中同时进行贝叶斯变量选择和缺失数据插补,我们称之为“同时插补和选择”(SIAS)。这些方法是使用针对多元正态数据集的称为随机搜索变量选择的贝叶斯程序来实现和评估的,但这两种策略都提供了通用框架,在其中可以将不同的贝叶斯变量选择算法用于其他类型的数据集。一项对寄养项目中儿童心理健康服务利用情况的研究被用来阐述这些技术。模拟研究表明,ITS和SIAS都优于采用逐步变量选择的完整病例分析,并且SIAS略优于ITS。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验