Paul Sanjoy K, Ling Joanna, Samanta Mayukh, Montvida Olga
Melbourne EpiCentre, University of Melbourne and Melbourne Health, Melbourne, Australia.
Royal Melbourne Institute of Technology, Melbourne, Australia.
J Healthc Inform Res. 2022 Sep 10;6(4):385-400. doi: 10.1007/s41666-022-00119-w. eCollection 2022 Dec.
Evaluating appropriate methodologies for imputation of missing outcome data from electronic medical records (EMRs) is crucial but lacking for observational studies. Using US EMR in people with type 2 diabetes treated over 12 and 24 months with dipeptidyl peptidase 4 inhibitors (DPP-4i, = 38,483) and glucagon-like peptide 1 receptor agonists (GLP-1RA, = 8,977), predictors of missingness of disease biomarker (HbA1c) were explored. Robustness of multiple imputation (MI) by chained equations, two-fold MI (MI-2F) and MI with Monte Carlo Markov Chain were compared to complete case analyses for drawing inferences. Compared to younger people (age quartile Q1), those in age quartile Q3 and Q4 were less likely to have missing HbA1c by 25-32% (range of OR CI: 0.55-0.88) at 6-month follow-up and by 26-39% (range of OR CI: 0.50-0.80) at 12-month follow-up. People with HbA1c ≥ 7.5% at baseline were 12% (OR CI: 0.83, 0.93) and 14% (OR CI: 0.77, 0.97) less likely to have missing data at 6-month follow-up in the DPP-4i and GLP-1RA groups, respectively. All imputation methods provided similar HbA1c distributions during follow-up as observed with complete case analyses. The clinical inferences based on absolute change in HbA1c and by proportion of people reducing HbA1c to a clinically acceptable level (≤ 7%) were also similar between imputed data and complete case analyses. MI-2F method provided marginally smaller mean difference between observed and imputed data with relatively smaller standard error of difference, compared to other methods, while evaluating for consistency through artificial within-sample analyses. The established MI techniques can be reliably employed for missing outcome data imputations in large EMR-based relational databases, leading to efficiently designing and drawing robust clinical inferences in pharmaco-epidemiological studies.
The online version contains supplementary material available at 10.1007/s41666-022-00119-w.
评估从电子病历(EMR)中插补缺失结局数据的合适方法至关重要,但在观察性研究中却很缺乏。利用美国电子病历,对接受二肽基肽酶4抑制剂(DPP - 4i,n = 38483)和胰高血糖素样肽1受体激动剂(GLP - 1RA,n = 8977)治疗12个月和24个月的2型糖尿病患者,探索了疾病生物标志物(糖化血红蛋白,HbA1c)缺失的预测因素。比较了通过链式方程进行的多重插补(MI)、双重多重插补(MI - 2F)和蒙特卡洛马尔可夫链多重插补与完全病例分析在进行推断时的稳健性。与年轻人(年龄四分位数Q1)相比,年龄四分位数Q3和Q4的人在6个月随访时HbA1c缺失的可能性降低25 - 32%(OR CI范围:0.55 - 0.88),在12个月随访时降低26 - 39%(OR CI范围:0.50 - 0.80)。基线时HbA1c≥7.5%的人在DPP - 4i组和GLP - 1RA组6个月随访时数据缺失的可能性分别降低12%(OR CI:0.83,0.93)和14%(OR CI:0.77,0.97)。所有插补方法在随访期间提供的HbA1c分布与完全病例分析中观察到的相似。基于HbA1c绝对变化以及将HbA1c降至临床可接受水平(≤7%)的人群比例的临床推断在插补数据和完全病例分析之间也相似。与其他方法相比,MI - 2F方法在通过人工样本内分析评估一致性时,观察到的数据与插补数据之间的平均差异略小,差异的标准误差也相对较小。既定的多重插补技术可可靠地用于基于大型电子病历的关系数据库中缺失结局数据的插补,从而在药物流行病学研究中高效地设计并得出稳健的临床推断。
在线版本包含可在10.1007/s41666 - 022 - 00119 - w获取的补充材料。