Robbins Michael W
Senior Statistician with the RAND Corporation, Pittsburgh, PA 15213, USA.
J Surv Stat Methodol. 2023 Sep 12;12(1):183-210. doi: 10.1093/jssam/smad034. eCollection 2024 Feb.
High-dimensional complex survey data of general structures (e.g., containing continuous, binary, categorical, and ordinal variables), such as the US Department of Defense's Health-Related Behaviors Survey (HRBS), often confound procedures designed to impute any missing survey data. Imputation by fully conditional specification (FCS) is often considered the state of the art for such datasets due to its generality and flexibility. However, FCS procedures contain a theoretical flaw that is exposed by HRBS data-HRBS imputations created with FCS are shown to diverge across iterations of Markov Chain Monte Carlo. Imputation by joint modeling lacks this flaw; however, current joint modeling procedures are neither general nor flexible enough to handle HRBS data. As such, we introduce an algorithm that efficiently and flexibly applies multiple imputation by joint modeling in data of general structures. This procedure draws imputations from a latent joint multivariate normal model that underpins the generally structured data and models the latent data via a sequence of conditional linear models, the predictors of which can be specified by the user. We perform rigorous evaluations of HRBS imputations created with the new algorithm and show that they are convergent and of high quality. Lastly, simulations verify that the proposed method performs well compared to existing algorithms including FCS.
一般结构的高维复杂调查数据(例如,包含连续、二元、分类和有序变量),如美国国防部的健康相关行为调查(HRBS),常常使旨在估算任何缺失调查数据的程序变得复杂。由于其通用性和灵活性,通过完全条件指定(FCS)进行插补通常被认为是处理此类数据集的先进方法。然而,FCS程序存在一个理论缺陷,这一缺陷在HRBS数据中暴露出来——用FCS创建的HRBS插补在马尔可夫链蒙特卡罗的迭代过程中会发散。通过联合建模进行插补不存在这个缺陷;然而,当前的联合建模程序在处理HRBS数据时既不够通用也不够灵活。因此,我们引入了一种算法,该算法能够在一般结构的数据中高效灵活地应用联合建模进行多次插补。此程序从一个潜在的联合多元正态模型中进行插补,该模型支撑着一般结构的数据,并通过一系列条件线性模型对潜在数据进行建模,用户可以指定这些模型的预测变量。我们对用新算法创建的HRBS插补进行了严格评估,结果表明它们是收敛的且质量很高。最后,模拟验证了与包括FCS在内的现有算法相比,所提出的方法表现良好。