用于可忽略缺失数据的双稳健非参数多重填补

Doubly Robust Nonparametric Multiple Imputation for Ignorable Missing Data.

作者信息

Long Qi, Hsu Chiu-Hsieh, Li Yisheng

机构信息

Emory University.

出版信息

Stat Sin. 2012;22:149-172.

PMID:22347786

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3280694/

Abstract

Missing data are common in medical and social science studies and often pose a serious challenge in data analysis. Multiple imputation methods are popular and natural tools for handling missing data, replacing each missing value with a set of plausible values that represent the uncertainty about the underlying values. We consider a case of missing at random (MAR) and investigate the estimation of the marginal mean of an outcome variable in the presence of missing values when a set of fully observed covariates is available. We propose a new nonparametric multiple imputation (MI) approach that uses two working models to achieve dimension reduction and define the imputing sets for the missing observations. Compared with existing nonparametric imputation procedures, our approach can better handle covariates of high dimension, and is doubly robust in the sense that the resulting estimator remains consistent if either of the working models is correctly specified. Compared with existing doubly robust methods, our nonparametric MI approach is more robust to the misspecification of both working models; it also avoids the use of inverse-weighting and hence is less sensitive to missing probabilities that are close to 1. We propose a sensitivity analysis for evaluating the validity of the working models, allowing investigators to choose the optimal weights so that the resulting estimator relies either completely or more heavily on the working model that is likely to be correctly specified and achieves improved efficiency. We investigate the asymptotic properties of the proposed estimator, and perform simulation studies to show that the proposed method compares favorably with some existing methods in finite samples. The proposed method is further illustrated using data from a colorectal adenoma study.

摘要

缺失数据在医学和社会科学研究中很常见，并且在数据分析中常常构成严峻挑战。多重填补方法是处理缺失数据的常用且自然的工具，它用一组合理的值替代每个缺失值，这些值代表了潜在值的不确定性。我们考虑随机缺失（MAR）的情况，并研究当有一组完全观测到的协变量时，存在缺失值情况下结果变量边际均值的估计。我们提出一种新的非参数多重填补（MI）方法，该方法使用两个工作模型来实现降维，并为缺失观测定义填补集。与现有的非参数填补程序相比，我们的方法能够更好地处理高维协变量，并且具有双重稳健性，即如果两个工作模型中有一个被正确设定，所得估计量仍保持一致性。与现有的双重稳健方法相比，我们的非参数MI方法对两个工作模型的错误设定更具稳健性；它还避免了使用逆加权，因此对接近1的缺失概率不太敏感。我们提出一种敏感性分析来评估工作模型的有效性，使研究者能够选择最优权重，从而使所得估计量完全或更主要地依赖于可能被正确设定的工作模型，并提高效率。我们研究了所提出估计量的渐近性质，并进行模拟研究以表明所提方法在有限样本中与一些现有方法相比具有优势。使用来自一项结肠直肠腺瘤研究的数据进一步说明了所提方法。

相似文献

Doubly Robust Nonparametric Multiple Imputation for Ignorable Missing Data.用于可忽略缺失数据的双稳健非参数多重填补

Stat Sin. 2012;22:149-172.

Cox regression analysis with missing covariates via nonparametric multiple imputation.Cox 回归分析中缺失协变量的非参数多重插补法。

Stat Methods Med Res. 2019 Jun;28(6):1676-1688. doi: 10.1177/0962280218772592. Epub 2018 May 2.

A nonparametric multiple imputation approach for missing categorical data.一种针对缺失分类数据的非参数多重填补方法。

BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.

A nonparametric multiple imputation approach for data with missing covariate values with application to colorectal adenoma data.一种针对具有缺失协变量值的数据的非参数多重填补方法及其在结直肠腺瘤数据中的应用。

J Biopharm Stat. 2014;24(3):634-48. doi: 10.1080/10543406.2014.888444.

Semiparametric dimension reduction estimation for mean response with missing data.具有缺失数据的均值响应的半参数降维估计

Biometrika. 2010 Jun;97(2):305-319. doi: 10.1093/biomet/asq005. Epub 2010 Apr 23.

Doubly robust estimation of generalized partial linear models for longitudinal data with dropouts.含缺失值的纵向数据广义部分线性模型的双稳健估计

Biometrics. 2017 Dec;73(4):1132-1139. doi: 10.1111/biom.12703. Epub 2017 Apr 3.

Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random.当某些生物标志物值随机缺失时，用于接收者操作特征分析的非参数多重插补。

Stat Med. 2011 Nov 20;30(26):3149-61. doi: 10.1002/sim.4338.

Multiple imputation by predictive mean matching in cluster-randomized trials.基于预测均数匹配的多重填补在整群随机临床试验中的应用。

BMC Med Res Methodol. 2020 Mar 30;20(1):72. doi: 10.1186/s12874-020-00948-6.

Doubly robust multiple imputation using kernel-based techniques.使用基于核技术的双重稳健多重填补

Biom J. 2016 May;58(3):588-606. doi: 10.1002/bimj.201400256. Epub 2015 Dec 9.

Doubly robust and multiple-imputation-based generalized estimating equations.双重稳健且基于多重填补的广义估计方程。

J Biopharm Stat. 2011 Mar;21(2):202-25. doi: 10.1080/10543406.2011.550096.

引用本文的文献

A semiparametric multiply robust multiple imputation method for causal inference.一种用于因果推断的半参数多重稳健多重填补方法。

Metrika. 2023 Jul;86(5):517-542. doi: 10.1007/s00184-022-00883-0. Epub 2022 Sep 12.

Inference from Nonrandom Samples Using Bayesian Machine Learning.使用贝叶斯机器学习从非随机样本进行推断。

J Surv Stat Methodol. 2022 Jan 20;11(2):433-455. doi: 10.1093/jssam/smab049. eCollection 2023 Apr.

Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms.使用机器学习算法获取有效因果效应估计值面临的挑战。

Am J Epidemiol. 2023 Sep 1;192(9). doi: 10.1093/aje/kwab201. Epub 2021 Jul 15.

A multiple imputation-based sensitivity analysis approach for data subject to missing not at random.基于多重插补的针对非随机缺失数据的敏感性分析方法。

Stat Med. 2020 Nov 20;39(26):3756-3771. doi: 10.1002/sim.8691. Epub 2020 Jul 27.

Jackknife empirical likelihood method for multiply robust estimation with missing data.用于缺失数据多重稳健估计的折刀法经验似然方法。

Comput Stat Data Anal. 2018 Nov;127:258-268. doi: 10.1016/j.csda.2018.05.011. Epub 2018 May 28.

Cox regression analysis with missing covariates via nonparametric multiple imputation.Cox 回归分析中缺失协变量的非参数多重插补法。

Stat Methods Med Res. 2019 Jun;28(6):1676-1688. doi: 10.1177/0962280218772592. Epub 2018 May 2.

3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data.3D-MICE：用于多分析物纵向临床数据的截面和纵向插补的集成。

J Am Med Inform Assoc. 2018 Jun 1;25(6):645-653. doi: 10.1093/jamia/ocx133.

A nonparametric multiple imputation approach for missing categorical data.一种针对缺失分类数据的非参数多重填补方法。

BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.

Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data.高维数据存在时一般缺失数据模式的多重填补

Sci Rep. 2016 Feb 12;6:21689. doi: 10.1038/srep21689.

Doubly robust multiple imputation using kernel-based techniques.使用基于核技术的双重稳健多重填补

Biom J. 2016 May;58(3):588-606. doi: 10.1002/bimj.201400256. Epub 2015 Dec 9.

本文引用的文献

Semiparametric dimension reduction estimation for mean response with missing data.具有缺失数据的均值响应的半参数降维估计

Biometrika. 2010 Jun;97(2):305-319. doi: 10.1093/biomet/asq005. Epub 2010 Apr 23.

Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data.提高用于估计具有不完整数据的总体均值的双重稳健估计量的效率和稳健性。

Biometrika. 2009 Sep;96(3):723-734. doi: 10.1093/biomet/asp033. Epub 2009 Aug 7.

Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data.评论：揭开双重稳健性的神秘面纱：从不完整数据估计总体均值的替代策略比较。

Stat Sci. 2007;22(4):569-573. doi: 10.1214/07-STS227.

Propensity score estimation with boosted regression for evaluating causal effects in observational studies.使用增强回归进行倾向评分估计以评估观察性研究中的因果效应。

Psychol Methods. 2004 Dec;9(4):403-25. doi: 10.1037/1082-989X.9.4.403.

Reliability and validity of a self-administered food frequency questionnaire in a chemoprevention trial of adenoma recurrence.一份自填式食物频率问卷在腺瘤复发化学预防试验中的信度和效度

Cancer Epidemiol Biomarkers Prev. 1999 Oct;8(10):941-6.

Multiple imputation in health-care databases: an overview and some applications.医疗保健数据库中的多重填补：概述与一些应用

Stat Med. 1991 Apr;10(4):585-98. doi: 10.1002/sim.4780100410.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验