Zhang Panpan, Xie Sharon X
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, 423 Gaurdian Drive, Philadelphia, 19104, PA, U.S.A.
Stat Biosci. 2025 Jun 12. doi: 10.1007/s12561-025-09493-6.
In this paper, we compare the performance of available-case analysis (ACA) and several multiple imputation (MI) approaches for handling missing data problems in longitudinal analysis through estimation bias and relative efficiency. When the missingness of covariates depends on observed responses, ACA produces estimation bias, but it is preferred when there are only missing values in longitudinal responses. Multilevel MI methods are not always a solution to longitudinal data analysis. Single-level MI methods, like fully conditional specification (FCS), provide unbiased estimates under a variety of missing data scenarios, and improve efficiency gain in certain scenarios. The general assumption of missing data mechanism is missing at random (MAR). We carry out a systematic synthetic data analysis where missing data exist in longitudinal outcomes or/and covariates under different kinds of missing data generation procedures. The analysis model is a linear mixed-effects model. For each of the missing data scenarios, we give our recommendation (between ACA and a specific MI method) based on theoretical justifications and extensive simulations. In addition, a longitudinal neurodegenerative disease dataset is used as a real case study.
在本文中,我们通过估计偏差和相对效率,比较了有效病例分析(ACA)和几种多重填补(MI)方法在纵向分析中处理缺失数据问题的性能。当协变量的缺失依赖于观测到的响应时,ACA会产生估计偏差,但当纵向响应中仅有缺失值时,它更受青睐。多级MI方法并不总是纵向数据分析的解决方案。单级MI方法,如完全条件设定(FCS),在各种缺失数据情形下都能提供无偏估计,并在某些情形下提高效率增益。缺失数据机制的一般假设是随机缺失(MAR)。我们进行了一项系统的合成数据分析,其中在不同类型的缺失数据生成过程下,纵向结果或/和协变量中存在缺失数据。分析模型是一个线性混合效应模型。对于每种缺失数据情形,我们基于理论依据和广泛的模拟给出建议(在ACA和一种特定的MI方法之间)。此外,一个纵向神经退行性疾病数据集被用作实际案例研究。