Suppr超能文献

通用数据的联合插补

Joint Imputation of General Data.

作者信息

Robbins Michael W

机构信息

Senior Statistician with the RAND Corporation, Pittsburgh, PA 15213, USA.

出版信息

J Surv Stat Methodol. 2023 Sep 12;12(1):183-210. doi: 10.1093/jssam/smad034. eCollection 2024 Feb.

Abstract

High-dimensional complex survey data of general structures (e.g., containing continuous, binary, categorical, and ordinal variables), such as the US Department of Defense's Health-Related Behaviors Survey (HRBS), often confound procedures designed to impute any missing survey data. Imputation by fully conditional specification (FCS) is often considered the state of the art for such datasets due to its generality and flexibility. However, FCS procedures contain a theoretical flaw that is exposed by HRBS data-HRBS imputations created with FCS are shown to diverge across iterations of Markov Chain Monte Carlo. Imputation by joint modeling lacks this flaw; however, current joint modeling procedures are neither general nor flexible enough to handle HRBS data. As such, we introduce an algorithm that efficiently and flexibly applies multiple imputation by joint modeling in data of general structures. This procedure draws imputations from a latent joint multivariate normal model that underpins the generally structured data and models the latent data via a sequence of conditional linear models, the predictors of which can be specified by the user. We perform rigorous evaluations of HRBS imputations created with the new algorithm and show that they are convergent and of high quality. Lastly, simulations verify that the proposed method performs well compared to existing algorithms including FCS.

摘要

一般结构的高维复杂调查数据(例如,包含连续、二元、分类和有序变量),如美国国防部的健康相关行为调查(HRBS),常常使旨在估算任何缺失调查数据的程序变得复杂。由于其通用性和灵活性,通过完全条件指定(FCS)进行插补通常被认为是处理此类数据集的先进方法。然而,FCS程序存在一个理论缺陷,这一缺陷在HRBS数据中暴露出来——用FCS创建的HRBS插补在马尔可夫链蒙特卡罗的迭代过程中会发散。通过联合建模进行插补不存在这个缺陷;然而,当前的联合建模程序在处理HRBS数据时既不够通用也不够灵活。因此,我们引入了一种算法,该算法能够在一般结构的数据中高效灵活地应用联合建模进行多次插补。此程序从一个潜在的联合多元正态模型中进行插补,该模型支撑着一般结构的数据,并通过一系列条件线性模型对潜在数据进行建模,用户可以指定这些模型的预测变量。我们对用新算法创建的HRBS插补进行了严格评估,结果表明它们是收敛的且质量很高。最后,模拟验证了与包括FCS在内的现有算法相比,所提出的方法表现良好。

相似文献

1
Joint Imputation of General Data.
J Surv Stat Methodol. 2023 Sep 12;12(1):183-210. doi: 10.1093/jssam/smad034. eCollection 2024 Feb.
2
Multiple imputation for discrete data: Evaluation of the joint latent normal model.
Biom J. 2019 Jul;61(4):1003-1019. doi: 10.1002/bimj.201800222. Epub 2019 Mar 14.
3
Multiple imputation in the presence of an incomplete binary variable created from an underlying continuous variable.
Biom J. 2020 Mar;62(2):467-478. doi: 10.1002/bimj.201900011. Epub 2019 Jul 15.
4
Multiple imputation of discrete and continuous data by fully conditional specification.
Stat Methods Med Res. 2007 Jun;16(3):219-42. doi: 10.1177/0962280206074463.
6
Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation.
Am J Epidemiol. 2010 Mar 1;171(5):624-32. doi: 10.1093/aje/kwp425. Epub 2010 Jan 27.
7
Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study.
Int J Stat Med Res. 2015;4(3):287-295. doi: 10.6000/1929-6029.2015.04.03.7. Epub 2015 Aug 19.
8
Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.
Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11.
10
Multiple imputation for missing values through conditional Semiparametric odds ratio models.
Biometrics. 2011 Sep;67(3):799-809. doi: 10.1111/j.1541-0420.2010.01538.x. Epub 2011 Jan 6.

本文引用的文献

1
Multiple imputation of missing data in multilevel models with the R package mdmb: a flexible sequential modeling approach.
Behav Res Methods. 2021 Dec;53(6):2631-2649. doi: 10.3758/s13428-020-01530-0. Epub 2021 May 23.
2
Multiple imputation in the presence of non-normal data.
Stat Med. 2017 Feb 20;36(4):606-617. doi: 10.1002/sim.7173. Epub 2016 Nov 15.
3
Analysis of sparse data in logistic regression in medical research: A newer approach.
J Postgrad Med. 2016 Jan-Mar;62(1):26-31. doi: 10.4103/0022-3859.173193.
4
Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.
Am J Epidemiol. 2014 Mar 15;179(6):764-74. doi: 10.1093/aje/kwt312. Epub 2014 Jan 12.
5
Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.
IEEE Trans Pattern Anal Mach Intell. 1984 Jun;6(6):721-41. doi: 10.1109/tpami.1984.4767596.
6
Combining multiple imputation and inverse-probability weighting.
Biometrics. 2012 Mar;68(1):129-37. doi: 10.1111/j.1541-0420.2011.01666.x. Epub 2011 Nov 3.
7
Multiple imputation using chained equations: Issues and guidance for practice.
Stat Med. 2011 Feb 20;30(4):377-99. doi: 10.1002/sim.4067. Epub 2010 Nov 30.
8
Multiple imputation for missing data via sequential regression trees.
Am J Epidemiol. 2010 Nov 1;172(9):1070-6. doi: 10.1093/aje/kwq260. Epub 2010 Sep 14.
9
Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation.
Am J Epidemiol. 2010 Mar 1;171(5):624-32. doi: 10.1093/aje/kwp425. Epub 2010 Jan 27.
10
Multiple imputation in a large-scale complex survey: a practical guide.
Stat Methods Med Res. 2010 Dec;19(6):653-70. doi: 10.1177/0962280208101273. Epub 2009 Aug 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验