Suppr超能文献

用于插补缺失协变量的顺序BART

Sequential BART for imputation of missing covariates.

作者信息

Xu Dandan, Daniels Michael J, Winterstein Almut G

机构信息

Department of Statistics, University of Florida, Gainesville, FL 32601, USA.

Departments of Integrative Biology, and Statistics & Data Sciences, The University of Texas at Austin, Austin, TX 78712, USA

出版信息

Biostatistics. 2016 Jul;17(3):589-602. doi: 10.1093/biostatistics/kxw009. Epub 2016 Mar 15.

Abstract

To conduct comparative effectiveness research using electronic health records (EHR), many covariates are typically needed to adjust for selection and confounding biases. Unfortunately, it is typical to have missingness in these covariates. Just using cases with complete covariates will result in considerable efficiency losses and likely bias. Here, we consider the covariates missing at random with missing data mechanism either depending on the response or not. Standard methods for multiple imputation can either fail to capture nonlinear relationships or suffer from the incompatibility and uncongeniality issues. We explore a flexible Bayesian nonparametric approach to impute the missing covariates, which involves factoring the joint distribution of the covariates with missingness into a set of sequential conditionals and applying Bayesian additive regression trees to model each of these univariate conditionals. Using data augmentation, the posterior for each conditional can be sampled simultaneously. We provide details on the computational algorithm and make comparisons to other methods, including parametric sequential imputation and two versions of multiple imputation by chained equations. We illustrate the proposed approach on EHR data from an affiliated tertiary care institution to examine factors related to hyperglycemia.

摘要

为了使用电子健康记录(EHR)进行比较效果研究,通常需要许多协变量来调整选择偏倚和混杂偏倚。不幸的是,这些协变量中存在缺失值是很常见的。仅使用协变量完整的病例会导致相当大的效率损失,并且可能产生偏差。在此,我们考虑协变量随机缺失,其缺失数据机制可能依赖于响应变量,也可能不依赖。多重填补的标准方法要么无法捕捉非线性关系,要么会遇到不相容性和非一致性问题。我们探索一种灵活的贝叶斯非参数方法来填补缺失的协变量,该方法包括将带有缺失值的协变量联合分布分解为一组顺序条件分布,并应用贝叶斯加法回归树对每个单变量条件分布进行建模。通过数据扩充,可以同时对每个条件分布的后验进行采样。我们提供了计算算法的详细信息,并与其他方法进行了比较,包括参数顺序填补法和两种链式方程多重填补法。我们在一家附属三级医疗机构的EHR数据上说明了所提出的方法,以检查与高血糖相关的因素。

相似文献

1
Sequential BART for imputation of missing covariates.用于插补缺失协变量的顺序BART
Biostatistics. 2016 Jul;17(3):589-602. doi: 10.1093/biostatistics/kxw009. Epub 2016 Mar 15.
2
Multiple imputation with missing data indicators.带有缺失数据指标的多重插补。
Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.
8
A Bayesian Latent Variable Selection Model for Nonignorable Missingness.贝叶斯潜在变量选择模型在不可忽略缺失数据中的应用
Multivariate Behav Res. 2022 Mar-May;57(2-3):478-512. doi: 10.1080/00273171.2021.1874259. Epub 2021 Feb 2.
9
Multiple imputation in the presence of high-dimensional data.高维数据情形下的多重填补
Stat Methods Med Res. 2016 Oct;25(5):2021-2035. doi: 10.1177/0962280213511027. Epub 2013 Nov 25.
10
Multiple imputation for missing data via sequential regression trees.基于序贯回归树的缺失数据多重插补法。
Am J Epidemiol. 2010 Nov 1;172(9):1070-6. doi: 10.1093/aje/kwq260. Epub 2010 Sep 14.

引用本文的文献

4
Missing data imputation using classification and regression trees.使用分类与回归树进行缺失数据插补
PeerJ Comput Sci. 2024 Jun 28;10:e2119. doi: 10.7717/peerj-cs.2119. eCollection 2024.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验