• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有缺失协变量的线性回归模型中的插补和变量选择

Imputation and variable selection in linear regression models with missing covariates.

作者信息

Yang Xiaowei, Belin Thomas R, Boscardin W John

机构信息

Department of Biostatistics, University of California, 11075 Santa Monica Boulevard, Suite 200, Los Angeles, California 90095-1772, USA.

出版信息

Biometrics. 2005 Jun;61(2):498-506. doi: 10.1111/j.1541-0420.2005.00317.x.

DOI:10.1111/j.1541-0420.2005.00317.x
PMID:16011697
Abstract

Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.

摘要

在多个插补数据集上,诸如逐步回归等变量选择方法以及其他基于准则的策略(包括纳入或排除特定变量)通常会导致模型具有不同的选定预测变量,从而给合并单独的完整数据分析结果带来问题。在此,基于贝叶斯框架,我们提出两种替代策略,以解决存在协变量缺失时线性回归模型选择的问题。一种方法,我们称之为“先插补,后选择”(ITS),它首先进行多次插补,然后将贝叶斯变量选择应用于多个插补数据集。第二种策略是在一个吉布斯抽样过程中同时进行贝叶斯变量选择和缺失数据插补,我们称之为“同时插补和选择”(SIAS)。这些方法是使用针对多元正态数据集的称为随机搜索变量选择的贝叶斯程序来实现和评估的,但这两种策略都提供了通用框架,在其中可以将不同的贝叶斯变量选择算法用于其他类型的数据集。一项对寄养项目中儿童心理健康服务利用情况的研究被用来阐述这些技术。模拟研究表明,ITS和SIAS都优于采用逐步变量选择的完整病例分析,并且SIAS略优于ITS。

相似文献

1
Imputation and variable selection in linear regression models with missing covariates.具有缺失协变量的线性回归模型中的插补和变量选择
Biometrics. 2005 Jun;61(2):498-506. doi: 10.1111/j.1541-0420.2005.00317.x.
2
Sequential BART for imputation of missing covariates.用于插补缺失协变量的顺序BART
Biostatistics. 2016 Jul;17(3):589-602. doi: 10.1093/biostatistics/kxw009. Epub 2016 Mar 15.
3
Nonlinear multiple imputation for continuous covariate within semiparametric Cox model: application to HIV data in Senegal.半参数 Cox 模型中连续协变量的非线性多重插补:在塞内加尔 HIV 数据中的应用。
Stat Med. 2013 Nov 20;32(26):4651-65. doi: 10.1002/sim.5854. Epub 2013 May 28.
4
A two-step semiparametric method to accommodate sampling weights in multiple imputation.一种用于在多重填补中纳入抽样权重的两步半参数方法。
Biometrics. 2016 Mar;72(1):242-52. doi: 10.1111/biom.12413. Epub 2015 Sep 22.
5
How should variable selection be performed with multiply imputed data?对于多重填补的数据,应如何进行变量选择?
Stat Med. 2008 Jul 30;27(17):3227-46. doi: 10.1002/sim.3177.
6
Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法
Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.
7
Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach.流行病学研究中处理协变量缺失的问题:多重填补法与全贝叶斯方法的比较
Stat Med. 2016 Jul 30;35(17):2955-74. doi: 10.1002/sim.6944. Epub 2016 Apr 4.
8
A Bayesian Latent Variable Selection Model for Nonignorable Missingness.贝叶斯潜在变量选择模型在不可忽略缺失数据中的应用
Multivariate Behav Res. 2022 Mar-May;57(2-3):478-512. doi: 10.1080/00273171.2021.1874259. Epub 2021 Feb 2.
9
Bayesian analysis for generalized linear models with nonignorably missing covariates.具有不可忽略缺失协变量的广义线性模型的贝叶斯分析。
Biometrics. 2005 Sep;61(3):767-80. doi: 10.1111/j.1541-0420.2005.00338.x.
10
Variable selection for multiply-imputed data with application to dioxin exposure study.具有应用于二恶英暴露研究的多重插补数据的变量选择。
Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.

引用本文的文献

1
Bayesian semiparametric inference in longitudinal metabolomics data.纵向代谢组学数据中的贝叶斯半参数推断
Sci Rep. 2024 Dec 28;14(1):31336. doi: 10.1038/s41598-024-82718-8.
2
Multi-omics regulatory network inference in the presence of missing data.存在缺失数据时的多组学调控网络推断。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad309.
3
Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods.多重插补数据集的变量选择:在堆叠法和分组法之间进行选择。
J Comput Graph Stat. 2022;31(4):1063-1075. doi: 10.1080/10618600.2022.2035739. Epub 2022 Mar 28.
4
On the Relation between Prediction and Imputation Accuracy under Missing Covariates.缺失协变量情况下预测与插补准确性之间的关系
Entropy (Basel). 2022 Mar 9;24(3):386. doi: 10.3390/e24030386.
5
How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.如何在多重插补数据中应用变量选择机器学习算法:一个缺失的讨论。
Psychol Methods. 2023 Apr;28(2):452-471. doi: 10.1037/met0000478. Epub 2022 Feb 3.
6
Negative impact of maternal antenatal depressive symptoms on neonate's behavioral characteristics.母亲产前抑郁症状对新生儿行为特征的负面影响。
Eur Child Adolesc Psychiatry. 2020 Apr;29(4):515-526. doi: 10.1007/s00787-019-01367-9. Epub 2019 Jul 11.
7
Cost-effectiveness of habit-based advice for weight control versus usual care in general practice in the Ten Top Tips (10TT) trial: economic evaluation based on a randomised controlled trial.“十大技巧”(10TT)试验中,基于习惯的体重控制建议与全科常规护理的成本效益:基于随机对照试验的经济评估
BMJ Open. 2018 Aug 13;8(8):e017511. doi: 10.1136/bmjopen-2017-017511.
8
Variable Selection in the Presence of Missing Data: Imputation-based Methods.存在缺失数据时的变量选择:基于插补的方法。
Wiley Interdiscip Rev Comput Stat. 2017 Sep-Oct;9(5). doi: 10.1002/wics.1402. Epub 2017 May 24.
9
Long-Term Cause-Specific Mortality After Surgery for Women With Breast Cancer: A 20-Year Follow-Up Study From Surveillance, Epidemiology, and End Results Cancer Registries.乳腺癌女性手术后的长期特定病因死亡率:一项来自监测、流行病学和最终结果癌症登记处的20年随访研究。
Breast Cancer (Auckl). 2017 Jun 1;11:1178223417711429. doi: 10.1177/1178223417711429. eCollection 2017.
10
Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research.在比较效果研究中合并异质数据源时,采用引导式贝叶斯插补法调整混杂因素。
Biostatistics. 2017 Jul 1;18(3):553-568. doi: 10.1093/biostatistics/kxx003.