系统和偶发性缺失数据的分层插补：一种使用链式方程的近似贝叶斯方法。

Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations.

作者信息

Jolani Shahab

机构信息

Department of Methodology and Statistics, CAPHRI, Maastricht University, 6229, HA, Maastricht, The Netherlands.

出版信息

Biom J. 2018 Mar;60(2):333-351. doi: 10.1002/bimj.201600220. Epub 2017 Oct 9.

DOI:10.1002/bimj.201600220

PMID:28990686

Abstract

In health and medical sciences, multiple imputation (MI) is now becoming popular to obtain valid inferences in the presence of missing data. However, MI of clustered data such as multicenter studies and individual participant data meta-analysis requires advanced imputation routines that preserve the hierarchical structure of data. In clustered data, a specific challenge is the presence of systematically missing data, when a variable is completely missing in some clusters, and sporadically missing data, when it is partly missing in some clusters. Unfortunately, little is known about how to perform MI when both types of missing data occur simultaneously. We develop a new class of hierarchical imputation approach based on chained equations methodology that simultaneously imputes systematically and sporadically missing data while allowing for arbitrary patterns of missingness among them. Here, we use a random effect imputation model and adopt a simplification over fully Bayesian techniques such as Gibbs sampler to directly obtain draws of parameters within each step of the chained equations. We justify through theoretical arguments and extensive simulation studies that the proposed imputation methodology has good statistical properties in terms of bias and coverage rates of parameter estimates. An illustration is given in a case study with eight individual participant datasets.

摘要

在健康与医学科学领域，多重填补（MI）如今在处理存在缺失数据的情况下获取有效推断时变得越来越流行。然而，对于多中心研究和个体参与者数据荟萃分析等聚类数据的多重填补，需要先进的填补程序来保留数据的层次结构。在聚类数据中，一个特殊的挑战是存在系统性缺失数据（即某个变量在某些聚类中完全缺失）和偶发性缺失数据（即该变量在某些聚类中部分缺失）。不幸的是，对于这两种类型的缺失数据同时出现时如何进行多重填补，人们了解甚少。我们基于链式方程方法开发了一种新的层次填补方法，该方法能同时填补系统性和偶发性缺失数据，同时允许它们之间存在任意的缺失模式。在此，我们使用随机效应填补模型，并对诸如吉布斯采样器等全贝叶斯技术进行简化，以便在链式方程的每个步骤中直接获取参数的抽样值。我们通过理论论证和广泛的模拟研究证明，所提出的填补方法在参数估计的偏差和覆盖率方面具有良好的统计特性。在一个包含八个个体参与者数据集的案例研究中给出了一个示例。

相似文献

Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations.系统和偶发性缺失数据的分层插补：一种使用链式方程的近似贝叶斯方法。

Biom J. 2018 Mar;60(2):333-351. doi: 10.1002/bimj.201600220. Epub 2017 Oct 9.

Multiple imputation by chained equations for systematically and sporadically missing multilevel data.多水平数据系统缺失和随机缺失的链方程多重插补法。

Stat Methods Med Res. 2018 Jun;27(6):1634-1649. doi: 10.1177/0962280216666564. Epub 2016 Sep 19.

Multiple imputation for handling systematically missing confounders in meta-analysis of individual participant data.在个体参与者数据的荟萃分析中，使用多重填补法处理系统性缺失的混杂因素。

Stat Med. 2013 Dec 10;32(28):4890-905. doi: 10.1002/sim.5894. Epub 2013 Jul 16.

Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.多元纵向混合缺失数据插补方法的评价与研究

Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11.

A comparison of existing methods for multiple imputation in individual participant data meta-analysis.个体参与者数据荟萃分析中多重填补现有方法的比较。

Stat Med. 2017 Sep 30;36(22):3507-3532. doi: 10.1002/sim.7388. Epub 2017 Jul 10.

Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach.流行病学研究中处理协变量缺失的问题：多重填补法与全贝叶斯方法的比较

Stat Med. 2016 Jul 30;35(17):2955-74. doi: 10.1002/sim.6944. Epub 2016 Apr 4.

Sequential BART for imputation of missing covariates.用于插补缺失协变量的顺序BART

Biostatistics. 2016 Jul;17(3):589-602. doi: 10.1093/biostatistics/kxw009. Epub 2016 Mar 15.

SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations.超级小鼠：一种基于链式方程的多重填补集成机器学习方法。

Am J Epidemiol. 2022 Feb 19;191(3):516-525. doi: 10.1093/aje/kwab271.

Multiple imputation with missing data indicators.带有缺失数据指标的多重插补。

Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.

Performance of Multiple Imputation Using Modern Machine Learning Methods in Electronic Health Records Data.基于现代机器学习方法在电子健康记录数据中的应用表现。

Epidemiology. 2023 Mar 1;34(2):206-215. doi: 10.1097/EDE.0000000000001578. Epub 2022 Dec 9.

引用本文的文献

D3MI: an efficient and powerful federated imputation method for bias reduction in the analysis of distributed incomplete data by accounting for within-site correlation and between-site heterogeneity.D3MI：一种高效且强大的联邦插补方法，通过考虑站点内相关性和站点间异质性来减少分布式不完整数据分析中的偏差。

medRxiv. 2025 May 8:2025.05.08.25327224. doi: 10.1101/2025.05.08.25327224.

Developing a multivariable prediction model to support personalized selection among five major empirically-supported treatments for adult depression. Study protocol of a systematic review and individual participant data network meta-analysis.开发一个多变量预测模型，以支持在成人抑郁症的五种主要经验支持治疗方法中进行个性化选择。一项系统评价和个体参与者数据网络荟萃分析的研究方案。

PLoS One. 2025 Apr 23;20(4):e0322124. doi: 10.1371/journal.pone.0322124. eCollection 2025.

Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.概念框架作为选择临床结构化数据集中缺失值的适当插补方法的指南。

BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3.

Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.识别处理临床结构化数据集缺失值的最合适插补方法：系统评价。

BMC Med Res Methodol. 2024 Aug 28;24(1):188. doi: 10.1186/s12874-024-02310-6.

Methods for comparative effectiveness based on time to confirmed disability progression with irregular observations in multiple sclerosis.基于多发性硬化症中不规则观测下至确诊残疾进展时间的比较效果评估方法。

Stat Methods Med Res. 2023 Jul;32(7):1284-1299. doi: 10.1177/09622802231172032. Epub 2023 Jun 11.

Systematically missing data in causally interpretable meta-analysis.因果可解释的荟萃分析中系统性缺失的数据。

Biostatistics. 2024 Apr 15;25(2):289-305. doi: 10.1093/biostatistics/kxad006.

Practical strategies for operationalizing optimal allocation in stratified cluster-based outcome-dependent sampling designs.实用策略：在基于分层群集的结果依赖抽样设计中实现最佳分配。

Stat Med. 2023 Mar 30;42(7):917-935. doi: 10.1002/sim.9650. Epub 2023 Jan 17.

Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.多元纵向混合缺失数据插补方法的评价与研究

Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11.

A systematic review of how missing data are handled and reported in multi-database pharmacoepidemiologic studies.多数据库药物流行病学研究中缺失数据的处理和报告方法的系统评价。

Pharmacoepidemiol Drug Saf. 2021 Jul;30(7):819-826. doi: 10.1002/pds.5245. Epub 2021 May 7.

Racial Differences in Population Attributable Risk for Epithelial Ovarian Cancer in the OCWAA Consortium.OCWAA 联盟中上皮性卵巢癌的人群归因风险的种族差异。

J Natl Cancer Inst. 2021 Jun 1;113(6):710-718. doi: 10.1093/jnci/djaa188.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

系统和偶发性缺失数据的分层插补：一种使用链式方程的近似贝叶斯方法。

Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献