Restrepo Nicole A, Farber-Eger Eric, Crawford Dana C
Case Western Reserve University, Department of Epidemiology and Biostatistics, Cleveland, Ohio.
Vanderbilt University Medical Center, Vanderbilt Institute for Clinical and Translational Research, Nashville, Tennessee.
AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:221-30. eCollection 2016.
A hurdle to EMR-based studies is the characterization and extraction of complex phenotypes not readily defined by single diagnostic/procedural codes. Here we developed an algorithm utilizing data mining techniques to identify a diabetic retinopathy (DR) cohort of type-2 diabetic African Americans from the Vanderbilt University de-identified EMR system. The algorithm incorporates a combination of diagnostic codes, current procedural terminology billing codes, medications, and text matching to identify DR when gold-standard digital photography results were unavailable. DR cases were identified with a positive predictive value of 75.3% and an accuracy of 84.8%. Controls were classified with a negative predictive value of 1.0% as could be assessed. Limited studies of DR have been performed in African Americans who are at an elevated risk of DR. Identification of EMR-based African American cohorts may help stimulate new biomedical studies that could elucidate differences in risk for the development of DR and other complex diseases.
基于电子病历(EMR)的研究面临的一个障碍是难以对复杂的表型进行特征描述和提取,这些表型无法轻易地通过单一诊断/程序代码来定义。在此,我们开发了一种算法,利用数据挖掘技术从范德堡大学匿名电子病历系统中识别出患有糖尿病视网膜病变(DR)的2型糖尿病非裔美国人队列。当无法获得金标准数码摄影结果时,该算法结合了诊断代码、当前程序术语计费代码、药物和文本匹配来识别DR。识别出的DR病例的阳性预测值为75.3%,准确率为84.8%。在可评估的范围内,对照的阴性预测值为1.0%。针对DR风险较高的非裔美国人进行的DR研究有限。识别基于电子病历的非裔美国人队列可能有助于推动新的生物医学研究,从而阐明DR和其他复杂疾病发生风险的差异。