MISL：超级学习的多重插补。

MISL: Multiple imputation by super learning.

机构信息

Department of Health Sciences, 1848Northeastern University, Boston, MA, USA.

出版信息

Stat Methods Med Res. 2022 Oct;31(10):1904-1915. doi: 10.1177/09622802221104238. Epub 2022 Jun 5.

DOI:10.1177/09622802221104238

PMID:35658622

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9709711/

Abstract

Multiple imputation techniques are commonly used when data are missing, however, there are many options one can consider. Multivariate imputation by chained equations is a popular method for generating imputations but relies on specifying models when imputing missing values. In this work, we introduce multiple imputation by super learning, an update to the multivariate imputation by chained equations method to generate imputations with ensemble learning. Ensemble methodologies have recently gained attention for use in inference and prediction as they optimally combine a variety of user-specified parametric and non-parametric models and perform well when estimating complex functions, including those with interaction terms. Through two simulations we compare inferences made using the multiple imputation by super learning approach to those made with other commonly used multiple imputation methods and demonstrate multiple imputation by super learning as a superior option when considering characteristics such as bias, confidence interval coverage rate, and confidence interval width.

摘要

当数据缺失时，通常会使用多种插补技术，但是有很多选项可供考虑。链式方程的多变量插补是一种生成插补值的常用方法，但在插补缺失值时需要指定模型。在这项工作中，我们引入了超级学习的多变量插补，这是对链式方程多变量插补方法的更新，使用集成学习生成插补值。集成方法最近因其在推理和预测中的应用而受到关注，因为它们可以最优地组合各种用户指定的参数和非参数模型，并且在估计复杂函数（包括具有交互项的函数）时表现良好。通过两个模拟，我们比较了使用超级学习的多变量插补方法进行推断与使用其他常用的多变量插补方法进行推断的结果，并证明了当考虑偏倚、置信区间覆盖率和置信区间宽度等特征时，超级学习的多变量插补是一种更好的选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96d/9709711/18a30cf8259b/nihms-1847452-f0001.jpg

相似文献

MISL: Multiple imputation by super learning.MISL：超级学习的多重插补。

Stat Methods Med Res. 2022 Oct;31(10):1904-1915. doi: 10.1177/09622802221104238. Epub 2022 Jun 5.

SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations.超级小鼠：一种基于链式方程的多重填补集成机器学习方法。

Am J Epidemiol. 2022 Feb 19;191(3):516-525. doi: 10.1093/aje/kwab271.

A fair comparison of tree-based and parametric methods in multiple imputation by chained equations.基于树的方法和参数方法在链式方程多重插补中的公平比较。

Stat Med. 2020 Apr 15;39(8):1156-1166. doi: 10.1002/sim.8468. Epub 2020 Jan 29.

Multiple imputation for handling missing outcome data when estimating the relative risk.采用多重插补处理估计相对危险度时丢失的结局数据。

BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.

Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值：一项模拟研究。

BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.

Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.基于 MICE 使用随机森林和参数插补模型比较缺失数据插补：CALIBER 研究。

Am J Epidemiol. 2014 Mar 15;179(6):764-74. doi: 10.1093/aje/kwt312. Epub 2014 Jan 12.

Logistic regression vs. predictive mean matching for imputing binary covariates.Logistic 回归与预测均值匹配在二进制协变量插补中的比较。

Stat Methods Med Res. 2023 Nov;32(11):2172-2183. doi: 10.1177/09622802231198795. Epub 2023 Sep 26.

Multiple imputation for missing data via sequential regression trees.基于序贯回归树的缺失数据多重插补法。

Am J Epidemiol. 2010 Nov 1;172(9):1070-6. doi: 10.1093/aje/kwq260. Epub 2010 Sep 14.

Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.多元纵向混合缺失数据插补方法的评价与研究

Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11.

A comparison of existing methods for multiple imputation in individual participant data meta-analysis.个体参与者数据荟萃分析中多重填补现有方法的比较。

Stat Med. 2017 Sep 30;36(22):3507-3532. doi: 10.1002/sim.7388. Epub 2017 Jul 10.

引用本文的文献

Causal estimation of time-varying treatments in observational studies: a scoping review of methods, applications, and missing data practices.观察性研究中时变治疗的因果估计：方法、应用及缺失数据处理的范围综述

BMC Med Res Methodol. 2025 Aug 27;25(1):202. doi: 10.1186/s12874-025-02633-y.

Impact of intermittent preventive treatment of malaria in pregnancy with sulfadoxine-pyrimethamine on sexually transmitted and reproductive tract infections: results from a randomised trial in Uganda.孕期使用磺胺多辛-乙胺嘧啶间歇性预防治疗疟疾对性传播感染和生殖道感染的影响：乌干达一项随机试验的结果

medRxiv. 2025 Jul 3:2025.07.02.25330769. doi: 10.1101/2025.07.02.25330769.

Application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects: a case study.将因果森林应用于随机对照试验数据以识别异质性治疗效果：一项案例研究。

BMC Med Res Methodol. 2025 Feb 22;25(1):50. doi: 10.1186/s12874-025-02489-2.

Design of a multicenter randomized controlled trial of a post-discharge suicide prevention intervention for high-risk psychiatric inpatients: The Veterans Coordinated Community Care Study.多中心随机对照试验设计：一项针对高风险精神科住院患者出院后预防自杀的干预措施——退伍军人协调社区护理研究。

Int J Methods Psychiatr Res. 2024 Dec;33(4):e70003. doi: 10.1002/mpr.70003.

Multi-metric comparison of machine learning imputation methods with application to breast cancer survival.基于机器学习的插补方法的多指标比较及其在乳腺癌生存分析中的应用。

BMC Med Res Methodol. 2024 Aug 30;24(1):191. doi: 10.1186/s12874-024-02305-3.

本文引用的文献

Two-Stage TMLE to reduce bias and improve efficiency in cluster randomized trials.两阶段 TMLE 可减少偏倚并提高群组随机试验的效率。

Biostatistics. 2023 Apr 14;24(2):502-517. doi: 10.1093/biostatistics/kxab043.

SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations.超级小鼠：一种基于链式方程的多重填补集成机器学习方法。

Am J Epidemiol. 2022 Feb 19;191(3):516-525. doi: 10.1093/aje/kwab271.

Prediction of an Acute Hypotensive Episode During an ICU Hospitalization With a Super Learner Machine-Learning Algorithm.使用超级学习机机器学习算法预测 ICU 住院期间的急性低血压发作。

Anesth Analg. 2020 May;130(5):1157-1166. doi: 10.1213/ANE.0000000000004539.

Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry.从临床注册研究中估计患者报告结局变化时缺失数据对偏差和精度的影响。

Health Qual Life Outcomes. 2019 Jun 20;17(1):106. doi: 10.1186/s12955-019-1181-2.

Missing data and multiple imputation in clinical epidemiological research.临床流行病学研究中的缺失数据与多重填补

Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.

Super Learner Analysis of Electronic Adherence Data Improves Viral Prediction and May Provide Strategies for Selective HIV RNA Monitoring.电子依从性数据的超级学习者分析可改善病毒预测，并可能为选择性HIV RNA监测提供策略。

J Acquir Immune Defic Syndr. 2015 May 1;69(1):109-18. doi: 10.1097/QAI.0000000000000548.

Tuning multiple imputation by predictive mean matching and local residual draws.通过预测均值匹配和局部残差抽样调整多重填补法。

BMC Med Res Methodol. 2014 Jun 5;14:75. doi: 10.1186/1471-2288-14-75.

Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables.在不完全分类变量的多重填补中避免因完美预测导致的偏差。

Comput Stat Data Anal. 2010 Oct 1;54(10):2267-2275. doi: 10.1016/j.csda.2010.04.005.

Missing data: a systematic review of how they are reported and handled.缺失数据：系统综述报告及处理方法。

Epidemiology. 2012 Sep;23(5):729-32. doi: 10.1097/EDE.0b013e3182576cdb.

Multiple imputation by chained equations: what is it and how does it work?多重链结方程插补法：是什么，以及它如何运作？

Int J Methods Psychiatr Res. 2011 Mar;20(1):40-9. doi: 10.1002/mpr.329.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验