• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于插补缺失协变量的顺序BART

Sequential BART for imputation of missing covariates.

作者信息

Xu Dandan, Daniels Michael J, Winterstein Almut G

机构信息

Department of Statistics, University of Florida, Gainesville, FL 32601, USA.

Departments of Integrative Biology, and Statistics & Data Sciences, The University of Texas at Austin, Austin, TX 78712, USA

出版信息

Biostatistics. 2016 Jul;17(3):589-602. doi: 10.1093/biostatistics/kxw009. Epub 2016 Mar 15.

DOI:10.1093/biostatistics/kxw009
PMID:26980459
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4915613/
Abstract

To conduct comparative effectiveness research using electronic health records (EHR), many covariates are typically needed to adjust for selection and confounding biases. Unfortunately, it is typical to have missingness in these covariates. Just using cases with complete covariates will result in considerable efficiency losses and likely bias. Here, we consider the covariates missing at random with missing data mechanism either depending on the response or not. Standard methods for multiple imputation can either fail to capture nonlinear relationships or suffer from the incompatibility and uncongeniality issues. We explore a flexible Bayesian nonparametric approach to impute the missing covariates, which involves factoring the joint distribution of the covariates with missingness into a set of sequential conditionals and applying Bayesian additive regression trees to model each of these univariate conditionals. Using data augmentation, the posterior for each conditional can be sampled simultaneously. We provide details on the computational algorithm and make comparisons to other methods, including parametric sequential imputation and two versions of multiple imputation by chained equations. We illustrate the proposed approach on EHR data from an affiliated tertiary care institution to examine factors related to hyperglycemia.

摘要

为了使用电子健康记录(EHR)进行比较效果研究,通常需要许多协变量来调整选择偏倚和混杂偏倚。不幸的是,这些协变量中存在缺失值是很常见的。仅使用协变量完整的病例会导致相当大的效率损失,并且可能产生偏差。在此,我们考虑协变量随机缺失,其缺失数据机制可能依赖于响应变量,也可能不依赖。多重填补的标准方法要么无法捕捉非线性关系,要么会遇到不相容性和非一致性问题。我们探索一种灵活的贝叶斯非参数方法来填补缺失的协变量,该方法包括将带有缺失值的协变量联合分布分解为一组顺序条件分布,并应用贝叶斯加法回归树对每个单变量条件分布进行建模。通过数据扩充,可以同时对每个条件分布的后验进行采样。我们提供了计算算法的详细信息,并与其他方法进行了比较,包括参数顺序填补法和两种链式方程多重填补法。我们在一家附属三级医疗机构的EHR数据上说明了所提出的方法,以检查与高血糖相关的因素。

相似文献

1
Sequential BART for imputation of missing covariates.用于插补缺失协变量的顺序BART
Biostatistics. 2016 Jul;17(3):589-602. doi: 10.1093/biostatistics/kxw009. Epub 2016 Mar 15.
2
Multiple imputation with missing data indicators.带有缺失数据指标的多重插补。
Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.
3
Bayesian causal inference for observational studies with missingness in covariates and outcomes.贝叶斯因果推断在协变量和结局缺失的观察性研究中的应用。
Biometrics. 2023 Dec;79(4):3624-3636. doi: 10.1111/biom.13918. Epub 2023 Aug 8.
4
Imputation and variable selection in linear regression models with missing covariates.具有缺失协变量的线性回归模型中的插补和变量选择
Biometrics. 2005 Jun;61(2):498-506. doi: 10.1111/j.1541-0420.2005.00317.x.
5
Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach.流行病学研究中处理协变量缺失的问题:多重填补法与全贝叶斯方法的比较
Stat Med. 2016 Jul 30;35(17):2955-74. doi: 10.1002/sim.6944. Epub 2016 Apr 4.
6
Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations.系统和偶发性缺失数据的分层插补:一种使用链式方程的近似贝叶斯方法。
Biom J. 2018 Mar;60(2):333-351. doi: 10.1002/bimj.201600220. Epub 2017 Oct 9.
7
An Efficient and Effective Model to Handle Missing Data in Classification.一种用于分类中处理缺失数据的高效有效模型。
Biomed Res Int. 2020 Nov 25;2020:8810143. doi: 10.1155/2020/8810143. eCollection 2020.
8
A Bayesian Latent Variable Selection Model for Nonignorable Missingness.贝叶斯潜在变量选择模型在不可忽略缺失数据中的应用
Multivariate Behav Res. 2022 Mar-May;57(2-3):478-512. doi: 10.1080/00273171.2021.1874259. Epub 2021 Feb 2.
9
Multiple imputation in the presence of high-dimensional data.高维数据情形下的多重填补
Stat Methods Med Res. 2016 Oct;25(5):2021-2035. doi: 10.1177/0962280213511027. Epub 2013 Nov 25.
10
Multiple imputation for missing data via sequential regression trees.基于序贯回归树的缺失数据多重插补法。
Am J Epidemiol. 2010 Nov 1;172(9):1070-6. doi: 10.1093/aje/kwq260. Epub 2010 Sep 14.

引用本文的文献

1
Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.概念框架作为选择临床结构化数据集中缺失值的适当插补方法的指南。
BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3.
2
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.超越医学统计学:电子健康记录中缺失数据处理的系统评价
Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024.
3
Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.识别处理临床结构化数据集缺失值的最合适插补方法:系统评价。
BMC Med Res Methodol. 2024 Aug 28;24(1):188. doi: 10.1186/s12874-024-02310-6.
4
Missing data imputation using classification and regression trees.使用分类与回归树进行缺失数据插补
PeerJ Comput Sci. 2024 Jun 28;10:e2119. doi: 10.7717/peerj-cs.2119. eCollection 2024.
5
Assessing treatment effect heterogeneity in the presence of missing effect modifier data in cluster-randomized trials.在整群随机试验中存在效应修饰因素数据缺失的情况下评估治疗效果异质性。
Stat Methods Med Res. 2024 May;33(5):909-927. doi: 10.1177/09622802241242323. Epub 2024 Apr 3.
6
Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics.海量数据的多重插补:在收入动态面板研究中的应用
J Surv Stat Methodol. 2021 Oct 19;11(1):260-283. doi: 10.1093/jssam/smab038. eCollection 2023 Feb.
7
Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series.基于树的机器学习在健康研究中的应用:文献综述和病例系列。
Int J Environ Res Public Health. 2022 Dec 1;19(23):16080. doi: 10.3390/ijerph192316080.
8
Intimate Partner Violence and HIV Prevention Among Sexual Minority Men: Protocol for a Prospective Mixed Methods Cohort Study.男同性恋者中的亲密伴侣暴力与艾滋病预防:一项前瞻性混合方法队列研究方案
JMIR Res Protoc. 2022 Nov 15;11(11):e41453. doi: 10.2196/41453.
9
ROBUST INFERENCE WHEN COMBINING INVERSE-PROBABILITY WEIGHTING AND MULTIPLE IMPUTATION TO ADDRESS MISSING DATA WITH APPLICATION TO AN ELECTRONIC HEALTH RECORDS-BASED STUDY OF BARIATRIC SURGERY.在结合逆概率加权和多重填补以处理缺失数据并应用于基于电子健康记录的减肥手术研究时的稳健推断
Ann Appl Stat. 2021 Mar;15(1):126-147. doi: 10.1214/20-aoas1386.
10
Association of Cardiovascular Risk Trajectory With Cognitive Decline and Incident Dementia.心血管风险轨迹与认知能力下降和痴呆症发病的关联。
Neurology. 2022 May 17;98(20):e2013-e2022. doi: 10.1212/WNL.0000000000200255. Epub 2022 Apr 20.

本文引用的文献

1
Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates.存在辅助协变量时可忽略缺失情况下的全贝叶斯推断。
Biometrics. 2014 Mar;70(1):62-72. doi: 10.1111/biom.12121. Epub 2013 Dec 10.
2
Multiple imputation for missing data via sequential regression trees.基于序贯回归树的缺失数据多重插补法。
Am J Epidemiol. 2010 Nov 1;172(9):1070-6. doi: 10.1093/aje/kwq260. Epub 2010 Sep 14.
3
Using the outcome for imputation of missing predictor values was preferred.使用结果来插补缺失的预测变量值是更可取的。
J Clin Epidemiol. 2006 Oct;59(10):1092-101. doi: 10.1016/j.jclinepi.2006.01.009. Epub 2006 Jun 19.
4
Semiparametric models for missing covariate and response data in regression models.回归模型中缺失协变量和响应数据的半参数模型。
Biometrics. 2006 Mar;62(1):177-84. doi: 10.1111/j.1541-0420.2005.00438.x.