Am J Epidemiol. 2023 Nov 10;192(12):2075-2084. doi: 10.1093/aje/kwad139.
Incomplete longitudinal data are common in life-course epidemiology and may induce bias leading to incorrect inference. Multiple imputation (MI) is increasingly preferred for handling missing data, but few studies explore MI-method performance and feasibility in real-data settings. We compared 3 MI methods using real data under 9 missing-data scenarios, representing combinations of 10%, 20%, and 30% missingness and missing completely at random, at random, and not at random. Using data from Health and Retirement Study (HRS) participants, we introduced record-level missingness to a sample of participants with complete data on depressive symptoms (1998-2008), mortality (2008-2018), and relevant covariates. We then imputed missing data using 3 MI methods (normal linear regression, predictive mean matching, variable-tailored specification), and fitted Cox proportional hazards models to estimate effects of 4 operationalizations of longitudinal depressive symptoms on mortality. We compared bias in hazard ratios, root mean square error, and computation time for each method. Bias was similar across MI methods, and results were consistent across operationalizations of the longitudinal exposure variable. However, our results suggest that predictive mean matching may be an appealing strategy for imputing life-course exposure data, given consistently low root mean square error, competitive computation times, and few implementation challenges.
在生命历程流行病学中,不完全的纵向数据很常见,可能会导致偏差,从而得出错误的推论。多重插补(MI)越来越多地被用于处理缺失数据,但很少有研究探讨 MI 方法在实际数据环境中的性能和可行性。我们使用真实数据在 9 种缺失数据情况下比较了 3 种 MI 方法,这些情况代表了缺失率为 10%、20%和 30%的完全随机缺失、随机缺失和非随机缺失的组合。我们使用健康与退休研究(HRS)参与者的数据,在一个完全有抑郁症状(1998-2008 年)、死亡率(2008-2018 年)和相关协变量数据的参与者样本中引入记录级别的缺失。然后,我们使用 3 种 MI 方法(正态线性回归、预测均值匹配、变量定制规范)对缺失数据进行插补,并拟合 Cox 比例风险模型来估计 4 种纵向抑郁症状的操作化对死亡率的影响。我们比较了每种方法的危险比偏差、均方根误差和计算时间。在 MI 方法中,偏差是相似的,并且结果在纵向暴露变量的不同操作化中是一致的。然而,我们的结果表明,考虑到预测均值匹配具有始终较低的均方根误差、具有竞争力的计算时间和很少的实施挑战,它可能是一种有吸引力的插补生命历程暴露数据的策略。