Varghese Ben T, Girardo Marlene E, Gupta Ruchi, Fischer Karen M, Duellman Madison, Mielke Michelle M, Egan Aoife M, Olson Janet E, Vella Adrian, Bailey Kent R, Dugani Sagar B
Division of Hospital Internal Medicine, Mayo Clinic, Rochester, Minnesota, USA.
Internal Medicine Residency Program, Ascension Saint Francis Hospital, Evanston, Illinois, USA.
Metab Syndr Relat Disord. 2025 May;23(4):186-192. doi: 10.1089/met.2024.0133. Epub 2025 Apr 7.
Identifying participants with type 2 diabetes (T2D) based only on electronic health record (EHR) or self-reported data has limited accuracy. Therefore, the objective of the study was to develop an algorithm using EHR and self-reported data to identify participants with and without T2D. We included participants enrolled in the Mayo Clinic Biobank. At enrollment, participants completed a baseline questionnaire on health conditions, including T2D, and provided access to their EHR data. T2D status was based on self-report and EHR data (International Classification of Diseases codes, hemoglobin A1c [HbA1c], plasma glucose, and glucose-regulating medications) within 5 years prior to and 2 months after enrollment. Participants who self-reported T2D but lacked corroborating EHR data were categorized separately ("only self-reported T2D"). After identifying participants with T2D, we identified participants without T2D based on normal HbA1c and plasma glucose. Participants who self-reported the absence of T2D but lacked corroborating EHR data were categorized separately ("only self-reported no T2D"). Using manual chart reviews (gold standard), we calculated the positive and negative predictive values (NPV) to identify T2D. Of 57,000 participants, the algorithm classified participants as having T2D ( = 6,238), no T2D ( = 38,883), "only self-reported T2D" ( = 757), and "only self-reported no-T2D" ( = 9,759). The algorithm had a high positive predictive value (96.0% [91.5%-98.5%]), NPV (100% [98.0%-100%]), and accuracy (99.5% [98.3%-99.8%]). Participant age (median [range]) ranged from 52 (18-98) years (only self-reported T2D) to 67 (19-99) years (T2D) ( < 0.0001), and the proportion of women ranged from 45.3% (T2D) to 69.6% (only self-reported no T2D) ( < 0.0001). Most participants were of the White race (84.0%-92.7%) and non-Hispanic ethnicity (97.6%-98.6%). In this study, we developed an algorithm to accurately identify participants with and without T2D, which may be generalizable to cohorts with linked EHR data.
仅基于电子健康记录(EHR)或自我报告数据来识别2型糖尿病(T2D)患者,准确性有限。因此,本研究的目的是开发一种利用EHR和自我报告数据来识别患有和未患有T2D的参与者的算法。我们纳入了参加梅奥诊所生物样本库的参与者。在入组时,参与者完成了一份关于健康状况(包括T2D)的基线问卷,并提供了对其EHR数据的访问权限。T2D状态基于入组前5年和入组后2个月内的自我报告以及EHR数据(国际疾病分类代码、糖化血红蛋白[HbA1c]、血浆葡萄糖和降糖药物)。自我报告患有T2D但缺乏EHR数据佐证的参与者被单独分类(“仅自我报告患有T2D”)。在识别出患有T2D的参与者后,我们根据正常的HbA1c和血浆葡萄糖识别出未患有T2D的参与者。自我报告未患有T2D但缺乏EHR数据佐证的参与者被单独分类(“仅自我报告未患有T2D”)。通过人工病历审查(金标准),我们计算了识别T2D的阳性预测值和阴性预测值(NPV)。在57,000名参与者中,该算法将参与者分类为患有T2D(n = 6,238)、未患有T2D(n = 38,883)、“仅自我报告患有T2D”(n = 757)和“仅自我报告未患有T2D”(n = 9,759)。该算法具有较高的阳性预测值(96.0%[91.5%-98.5%])、NPV(100%[98.0%-100%])和准确性(99.5%[98.3%-99.8%])。参与者年龄(中位数[范围])从52岁(18 - 98岁)(仅自我报告患有T2D)到67岁(19 - 99岁)(患有T2D)(P < 0.0001),女性比例从45.3%(患有T2D)到69.6%(仅自我报告未患有T2D)(P < 0.0001)。大多数参与者为白人种族(84.0% - 92.7%)和非西班牙裔(97.6% - 98.6%)。在本研究中,我们开发了一种算法来准确识别患有和未患有T2D的参与者,该算法可能适用于具有关联EHR数据的队列。