Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, 3rd Floor, Boston, MA, 02119, USA.
Center for Biostatistics in AIDS Research in the Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
BMC Med Res Methodol. 2022 Nov 19;22(1):297. doi: 10.1186/s12874-022-01782-8.
The occurrence and timing of mycobacterial culture conversion is used as a proxy for tuberculosis treatment response. When researchers serially sample sputum during tuberculosis studies, contamination or missed visits leads to missing data points. Traditionally, this is managed by ignoring missing data or simple carry-forward techniques. Statistically advanced multiple imputation methods potentially decrease bias and retain sample size and statistical power.
We analyzed data from 261 participants who provided weekly sputa for the first 12 weeks of tuberculosis treatment. We compared methods for handling missing data points in a longitudinal study with a time-to-event outcome. Our primary outcome was time to culture conversion, defined as two consecutive weeks with no Mycobacterium tuberculosis growth. Methods used to address missing data included: 1) available case analysis, 2) last observation carried forward, and 3) multiple imputation by fully conditional specification. For each method, we calculated the proportion culture converted and used survival analysis to estimate Kaplan-Meier curves, hazard ratios, and restricted mean survival times. We compared methods based on point estimates, confidence intervals, and conclusions to specific research questions.
The three missing data methods lead to differences in the number of participants achieving conversion; 78 (32.8%) participants converted with available case analysis, 154 (64.7%) converted with last observation carried forward, and 184 (77.1%) converted with multiple imputation. Multiple imputation resulted in smaller point estimates than simple approaches with narrower confidence intervals. The adjusted hazard ratio for smear negative participants was 3.4 (95% CI 2.3, 5.1) using multiple imputation compared to 5.2 (95% CI 3.1, 8.7) using last observation carried forward and 5.0 (95% CI 2.4, 10.6) using available case analysis.
We showed that accounting for missing sputum data through multiple imputation, a statistically valid approach under certain conditions, can lead to different conclusions than naïve methods. Careful consideration for how to handle missing data must be taken and be pre-specified prior to analysis. We used data from a TB study to demonstrate these concepts, however, the methods we described are broadly applicable to longitudinal missing data. We provide valuable statistical guidance and code for researchers to appropriately handle missing data in longitudinal studies.
分枝杆菌培养转换的发生和时间被用作结核病治疗反应的替代指标。当研究人员在结核病研究中连续采集痰标本时,污染或漏诊会导致数据缺失。传统上,这是通过忽略缺失数据或简单的结转技术来处理的。统计上先进的多重插补方法有可能减少偏差并保留样本量和统计效力。
我们分析了 261 名参与者在结核病治疗的前 12 周每周提供痰标本的数据。我们比较了处理具有时间事件结局的纵向研究中缺失数据点的方法。我们的主要结局是培养转换时间,定义为连续两周无结核分枝杆菌生长。用于处理缺失数据的方法包括:1)可用病例分析,2)最后观察结转,3)完全条件规范的多重插补。对于每种方法,我们计算了培养转化率,并使用生存分析估计 Kaplan-Meier 曲线、风险比和受限平均生存时间。我们根据点估计、置信区间和结论来比较方法,并针对具体的研究问题进行比较。
三种缺失数据方法导致达到转换的参与者数量不同;78 名(32.8%)参与者经可用病例分析转换,154 名(64.7%)参与者经最后观察结转转换,184 名(77.1%)参与者经多重插补转换。多重插补的点估计值小于简单方法,置信区间较窄。在调整了痰涂片阴性患者的混杂因素后,与最后观察结转(风险比 5.2,95%置信区间 3.1,8.7)和可用病例分析(风险比 5.0,95%置信区间 2.4,10.6)相比,多重插补的风险比为 3.4(95%置信区间 2.3,5.1)。
我们表明,在某些条件下,通过多重插补对缺失的痰数据进行统计有效处理,可以得出与简单方法不同的结论。在分析之前,必须仔细考虑如何处理缺失数据,并事先规定。我们使用结核病研究的数据来演示这些概念,但是,我们描述的方法广泛适用于纵向缺失数据。我们为研究人员提供了有价值的统计指导和代码,以在纵向研究中正确处理缺失数据。