[如何处理缺失数据？链式方程多重填补：临床实践的建议与解释]

[How to deal with missing data? Multiple imputation by chained equations: recommendations and explanations for clinical practice].

作者信息

Legendre Bruno, Cerasuolo Damiano, Dejardin Olivier, Boyer Annabel

机构信息

Centre hospitalier universitaire de Caen, service de néphrologie, dialyse et transplantation, avenue de la Délivrande, 14000 Caen, France

Inserm U1086 ANTICIPE, Caen, France

出版信息

Nephrol Ther. 2023 Jun 19;19(3):171-179. doi: 10.1684/ndt.2023.24.

DOI:10.1684/ndt.2023.24

PMID:37272826

Abstract

The presence of missing data, a constant problem in medical research, has several consequences: systematic loss of power, associated or not with a reduction in the representativeness of the sample analyzed. There are three types of missing data: 1) missing completely at random (MCAR); 2) missing at random (MAR); 3) missing not at random (MNAR). Multiple imputation by chained equations allows for the correct handling of missing data under the MCAR and MAR assumptions. It allows to simulate for each missing data j, a number m of simulated values which seem plausible with regard to the other variables. A random effect is included in this simulation to express the uncertainty. Several data sets are thus created and analyzed individually, in an identical way. Then the estimators of each data set are combined to obtain a global estimator. Multiple imputation increases power, corrects for some biases and has the advantage of being applicable to many types of variables. Complete case analysis should no longer be the norm. The objective of this guide is to help the reader in conducting an analysis with multiple imputed data. We cover the following points: the different types of missing data, the different historical approaches to handling them, and then we detail the multiple imputation method using chained equations. We provide a code example for the mice package of R®.

摘要

缺失数据的存在是医学研究中一直存在的问题，会产生多种后果：系统性的效能损失，这可能与所分析样本代表性的降低有关，也可能无关。缺失数据有三种类型：1）完全随机缺失（MCAR）；2）随机缺失（MAR）；3）非随机缺失（MNAR）。链式方程多重填补法允许在MCAR和MAR假设下正确处理缺失数据。它允许为每个缺失数据j模拟m个关于其他变量看似合理的模拟值。在该模拟中纳入随机效应以表达不确定性。这样就创建了几个数据集，并以相同方式分别进行分析。然后将每个数据集的估计量合并以获得全局估计量。多重填补法提高了效能，校正了一些偏差，并且具有适用于多种类型变量的优点。完全病例分析不应再作为常规方法。本指南的目的是帮助读者进行多重填补数据分析。我们涵盖以下几点：缺失数据的不同类型、处理它们的不同历史方法，然后详细介绍使用链式方程的多重填补法。我们提供了R®软件mice包的代码示例。

相似文献

[How to deal with missing data? Multiple imputation by chained equations: recommendations and explanations for clinical practice].

Nephrol Ther. 2023 Jun 19;19(3):171-179. doi: 10.1684/ndt.2023.24.

Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors.

BMC Med Res Methodol. 2018 Aug 31;18(1):90. doi: 10.1186/s12874-018-0547-1.

Missing data and multiple imputation in clinical epidemiological research.

Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.

Missing data exploration: highlighting graphical presentation of missing pattern.

Ann Transl Med. 2015 Dec;3(22):356. doi: 10.3978/j.issn.2305-5839.2015.12.28.

A real data-driven simulation strategy to select an imputation method for mixed-type trait data.

PLoS Comput Biol. 2023 Mar 22;19(3):e1010154. doi: 10.1371/journal.pcbi.1010154. eCollection 2023 Mar.

Predictors of clinical outcome in pediatric oligodendroglioma: meta-analysis of individual patient data and multiple imputation.

J Neurosurg Pediatr. 2018 Feb;21(2):153-163. doi: 10.3171/2017.7.PEDS17133. Epub 2017 Dec 1.

Multiple imputation for non-response when estimating HIV prevalence using survey data.

BMC Public Health. 2015 Oct 16;15:1059. doi: 10.1186/s12889-015-2390-1.

[Multiple imputation of missing at random data: General points and presentation of a Monte-Carlo method].

Rev Epidemiol Sante Publique. 2009 Oct;57(5):361-72. doi: 10.1016/j.respe.2009.04.011. Epub 2009 Aug 11.

Handling missing data in clinical research.

J Clin Epidemiol. 2022 Nov;151:185-188. doi: 10.1016/j.jclinepi.2022.08.016. Epub 2022 Sep 21.

Imputation of missing values of tumour stage in population-based cancer registration.

BMC Med Res Methodol. 2011 Sep 19;11:129. doi: 10.1186/1471-2288-11-129.

引用本文的文献

Association between serum cotinine concentrations on red blood cell folate concentrations in pregnant women and the mediating role of lymphocytes: an NHANES Study.

Arch Public Health. 2025 Feb 21;83(1):49. doi: 10.1186/s13690-025-01533-3.

Development and Validation of Machine Learning Models for Risk Prediction of Postpartum Stress Urinary Incontinence: A Prospective Observational Study.

Int Urogynecol J. 2025 Jan 30. doi: 10.1007/s00192-025-06057-6.

Comparative study of imputation strategies to improve the sarcopenia prediction task.

Digit Health. 2025 Jan 17;11:20552076241301960. doi: 10.1177/20552076241301960. eCollection 2025 Jan-Dec.

Quantification of coronary artery calcification in systemic sclerosis using visual ordinal and deep learning scoring: Association with systemic sclerosis clinical features.

Semin Arthritis Rheum. 2025 Feb;70:152598. doi: 10.1016/j.semarthrit.2024.152598. Epub 2024 Nov 20.

Association between vitamin A intake and depression among patients with heart failure.

ESC Heart Fail. 2024 Dec;11(6):3796-3804. doi: 10.1002/ehf2.14935. Epub 2024 Jul 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

[如何处理缺失数据？链式方程多重填补：临床实践的建议与解释]

[How to deal with missing data? Multiple imputation by chained equations: recommendations and explanations for clinical practice].

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献