Suppr超能文献

离散数据的多重填补:联合潜在正态模型的评估

Multiple imputation for discrete data: Evaluation of the joint latent normal model.

作者信息

Quartagno Matteo, Carpenter James R

机构信息

Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.

MRC Clinical Trials Unit at UCL, 90 High Holborn, London, UK.

出版信息

Biom J. 2019 Jul;61(4):1003-1019. doi: 10.1002/bimj.201800222. Epub 2019 Mar 14.

Abstract

Missing data are ubiquitous in clinical and social research, and multiple imputation (MI) is increasingly the methodology of choice for practitioners. Two principal strategies for imputation have been proposed in the literature: joint modelling multiple imputation (JM-MI) and full conditional specification multiple imputation (FCS-MI). While JM-MI is arguably a preferable approach, because it involves specification of an explicit imputation model, FCS-MI is pragmatically appealing, because of its flexibility in handling different types of variables. JM-MI has developed from the multivariate normal model, and latent normal variables have been proposed as a natural way to extend this model to handle categorical variables. In this article, we evaluate the latent normal model through an extensive simulation study and an application on data from the German Breast Cancer Study Group, comparing the results with FCS-MI. We divide our investigation in four sections, focusing on (i) binary, (ii) categorical, (iii) ordinal, and (iv) count data. Using data simulated from both the latent normal model and the general location model, we find that in all but one extreme general location model setting JM-MI works very well, and sometimes outperforms FCS-MI. We conclude the latent normal model, implemented in the R package jomo, can be used with confidence by researchers, both for single and multilevel multiple imputation.

摘要

缺失数据在临床和社会研究中普遍存在,多重填补(MI)越来越成为从业者的首选方法。文献中提出了两种主要的填补策略:联合建模多重填补(JM-MI)和完全条件设定多重填补(FCS-MI)。虽然JM-MI可以说是一种更可取的方法,因为它涉及明确的填补模型设定,但FCS-MI在实际应用中很有吸引力,因为它在处理不同类型变量方面具有灵活性。JM-MI是从多元正态模型发展而来的,潜在正态变量已被提出作为将该模型扩展以处理分类变量的自然方式。在本文中,我们通过广泛的模拟研究和对德国乳腺癌研究组数据的应用来评估潜在正态模型,并将结果与FCS-MI进行比较。我们将研究分为四个部分,重点关注(i)二元数据、(ii)分类数据、(iii)有序数据和(iv)计数数据。使用从潜在正态模型和一般位置模型模拟的数据,我们发现除了一种极端的一般位置模型设置外,在所有情况下JM-MI都表现得非常好,有时甚至优于FCS-MI。我们得出结论,在R包jomo中实现的潜在正态模型可供研究人员放心使用,无论是用于单级还是多级多重填补。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ed/6618333/bdc52d12486a/BIMJ-61-1003-g001.jpg

相似文献

1
Multiple imputation for discrete data: Evaluation of the joint latent normal model.
Biom J. 2019 Jul;61(4):1003-1019. doi: 10.1002/bimj.201800222. Epub 2019 Mar 14.
2
Multiple imputation methods for missing multilevel ordinal outcomes.
BMC Med Res Methodol. 2023 May 9;23(1):112. doi: 10.1186/s12874-023-01909-5.
3
Handling missing data in matched case-control studies using multiple imputation.
Biometrics. 2015 Dec;71(4):1150-9. doi: 10.1111/biom.12358. Epub 2015 Aug 3.
4
Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.
Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11.
5
Multiple imputation in the presence of an incomplete binary variable created from an underlying continuous variable.
Biom J. 2020 Mar;62(2):467-478. doi: 10.1002/bimj.201900011. Epub 2019 Jul 15.
6
A comparison of multiple imputation methods for missing data in longitudinal studies.
BMC Med Res Methodol. 2018 Dec 12;18(1):168. doi: 10.1186/s12874-018-0615-6.
7
Rounding strategies for multiply imputed binary data.
Biom J. 2009 Aug;51(4):677-88. doi: 10.1002/bimj.200900018.
10
Multiple imputation of discrete and continuous data by fully conditional specification.
Stat Methods Med Res. 2007 Jun;16(3):219-42. doi: 10.1177/0962280206074463.

引用本文的文献

2
Reference-Based Multiple Imputation for Longitudinal Binary Data.
Stat Med. 2025 Feb 10;44(3-4):e10301. doi: 10.1002/sim.10301.
3
Multiple Imputation for Longitudinal Data: A Tutorial.
Stat Med. 2025 Feb 10;44(3-4):e10274. doi: 10.1002/sim.10274.
5
A novel machine learning-based imputation strategy for missing data in step-stress accelerated degradation test.
Heliyon. 2024 Feb 18;10(4):e26429. doi: 10.1016/j.heliyon.2024.e26429. eCollection 2024 Feb 29.
7
Two-stage or not two-stage? That is the question for IPD meta-analysis projects.
Res Synth Methods. 2023 Nov;14(6):903-910. doi: 10.1002/jrsm.1661. Epub 2023 Aug 22.
8
Multiple imputation methods for missing multilevel ordinal outcomes.
BMC Med Res Methodol. 2023 May 9;23(1):112. doi: 10.1186/s12874-023-01909-5.
9
Real-time imputation of missing predictor values in clinical practice.
Eur Heart J Digit Health. 2020 Dec 19;2(1):154-164. doi: 10.1093/ehjdh/ztaa016. eCollection 2021 Mar.
10
Assessing Alternative Imputation Strategies for Infrequently Missing Items on Multi-item Scales.
Commun Stat Case Stud Data Anal Appl. 2022;8(4):682-713. doi: 10.1080/23737484.2022.2115430. Epub 2022 Sep 1.

本文引用的文献

1
Multiple imputation in Cox regression when there are time-varying effects of covariates.
Stat Med. 2018 Nov 10;37(25):3661-3678. doi: 10.1002/sim.7842. Epub 2018 Jul 16.
2
Multiple imputation in the presence of non-normal data.
Stat Med. 2017 Feb 20;36(4):606-617. doi: 10.1002/sim.7173. Epub 2016 Nov 15.
3
A Comparison of Imputation Strategies for Ordinal Missing Data on Likert Scale Variables.
Multivariate Behav Res. 2015;50(5):484-503. doi: 10.1080/00273171.2015.1022644. Epub 2015 Jul 24.
4
Joint modelling rationale for chained equations.
BMC Med Res Methodol. 2014 Feb 21;14:28. doi: 10.1186/1471-2288-14-28.
5
Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model.
Stat Methods Med Res. 2015 Aug;24(4):462-87. doi: 10.1177/0962280214521348. Epub 2014 Feb 12.
6
Imputing missing covariate values for the Cox model.
Stat Med. 2009 Jul 10;28(15):1982-98. doi: 10.1002/sim.3618.
7
Bayesian Analysis of Multivariate Nominal Measures Using Multivariate Multinomial Probit Models.
Comput Stat Data Anal. 2008 Mar 15;52(7):3697-3708. doi: 10.1016/j.csda.2007.12.012.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验