Suppr超能文献

带有缺失数据指标的多重插补。

Multiple imputation with missing data indicators.

机构信息

Department of Biostatistics, 1259University of Michigan, USA.

Survey Methodology Program, Institute for Social Research, USA.

出版信息

Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.

Abstract

Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation, also called chained equations multiple imputation. In this approach, we impute missing values using regression models for each variable, conditional on the other variables in the data. This approach, however, assumes that the missingness mechanism is missing at random, and it is not well-justified under not-at-random missingness without additional modification. In this paper, we describe how we can generalize the sequential regression multiple imputation imputation procedure to handle missingness not at random in the setting where missingness may depend on other variables that are also missing but not on the missing variable itself, conditioning on fully observed variables. We provide algebraic justification for several generalizations of standard sequential regression multiple imputation using Taylor series and other approximations of the target imputation distribution under missingness not at random. Resulting regression model approximations include indicators for missingness, interactions, or other functions of the missingness not at random missingness model and observed data. In a simulation study, we demonstrate that the proposed sequential regression multiple imputation modifications result in reduced bias in the final analysis compared to standard sequential regression multiple imputation, with an approximation strategy involving inclusion of an offset in the imputation model performing the best overall. The method is illustrated in a breast cancer study, where the goal is to estimate the prevalence of a specific genetic pathogenic variant.

摘要

多重插补是一种用于分析缺失值数据的成熟通用技术。实现多重插补的一种便捷方法是顺序回归多重插补,也称为链式方程多重插补。在这种方法中,我们使用回归模型对每个变量进行插补,这些回归模型的条件是数据中的其他变量。然而,这种方法假设缺失机制是随机缺失的,如果在没有额外修改的情况下缺失不是随机的,则该方法没有得到很好的证明。在本文中,我们描述了如何将顺序回归多重插补插补过程推广到在缺失可能取决于其他也缺失但不依赖于缺失变量本身的变量的情况下处理非随机缺失的情况,条件是完全观察到的变量。我们使用泰勒级数和其他缺失数据下目标插补分布的近似值为标准顺序回归多重插补的几种推广提供了代数证明。由此产生的回归模型近似包括缺失指示符、交互作用或其他非随机缺失模型和观测数据的缺失函数。在一项模拟研究中,我们证明了与标准顺序回归多重插补相比,所提出的顺序回归多重插补修改会导致最终分析中的偏差降低,其中涉及在插补模型中包含偏移量的近似策略总体性能最佳。该方法在乳腺癌研究中得到了说明,目的是估计特定遗传致病性变体的流行率。

相似文献

1
Multiple imputation with missing data indicators.带有缺失数据指标的多重插补。
Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.
6
Accounting for not-at-random missingness through imputation stacking.通过插补堆叠来处理非随机缺失。
Stat Med. 2021 Nov 30;40(27):6118-6132. doi: 10.1002/sim.9174. Epub 2021 Aug 29.
7
Outcome-sensitive multiple imputation: a simulation study.结果敏感多重填补:一项模拟研究。
BMC Med Res Methodol. 2017 Jan 9;17(1):2. doi: 10.1186/s12874-016-0281-5.

引用本文的文献

本文引用的文献

1
Accounting for not-at-random missingness through imputation stacking.通过插补堆叠来处理非随机缺失。
Stat Med. 2021 Nov 30;40(27):6118-6132. doi: 10.1002/sim.9174. Epub 2021 Aug 29.
3
6
Using simulation studies to evaluate statistical methods.运用模拟研究评估统计方法。
Stat Med. 2019 May 20;38(11):2074-2102. doi: 10.1002/sim.8086. Epub 2019 Jan 16.
7
Missing data and prediction: the pattern submodel.缺失数据和预测:模式子模型。
Biostatistics. 2020 Apr 1;21(2):236-252. doi: 10.1093/biostatistics/kxy040.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验