Stein Joshua D, An Hong Su, Andrews Chris A, Pershing Suzann, Mungle Tushar, Bicket Amanda K, Rosenthal Julie M, Zhang Amy D, Lee Wen-Shin, Ludwig Cassie, Mekonnen Bethlehem, Hernandez-Boussard Tina
Department of Ophthalmology and Visual Sciences, University of Michigan, Ann Arbor, Michigan.
Department of Health Management and Policy, School of Public Health, University of Michigan, Ann Arbor, Michigan.
Ophthalmol Sci. 2025 Jan 24;5(4):100717. doi: 10.1016/j.xops.2025.100717. eCollection 2025 Jul-Aug.
For studies using real-world data, accurately identifying patients with phenotypes of interest is challenging. To identify cohorts of interest, most studies exclusively use the International Classification of Diseases (ICD) billing codes, which can be limiting. We developed a method to accurately identify the presence or absence of 3 common ocular diseases (diabetic retinopathy [DR], age-related macular degeneration [AMD], and glaucoma) using electronic health record (EHR) data.
Database study.
Three thousand nine hundred fourteen eyes from 1957 patients at 2 Sight OUtcomes Research CollaborativE (SOURCE) Ophthalmology Data Repository sites.
We developed enhanced phenotype identification (EPI) algorithms that search EHR fields, including eye examination findings, orders, charges, medication prescriptions, and surgery data for evidence that a patient has glaucoma, DR, or AMD. We trained our EPI models using gold standard assessments of the EHR by ophthalmologists for the presence/absence of these conditions, compared the performance of our EPI models to models developed using ICD codes alone, and validated the performance of model using data from another SOURCE site.
Area under the receiver operating curve (AUC), area under the precision-recall curve (AUPRC), and model calibration.
The AUCs of our EPI models were better than ICD-only models for glaucoma (0.97 vs. 0.90), DR (0.997 vs. 0.98), and AMD (0.99 vs. 0.95). The AUPRCs of our EPI models were also much better than ICD-only models for glaucoma (0.79 vs. 0.32), DR (0.96 vs. 0.84), and AMD (0.74 vs. 0.55). When testing on patients from a second SOURCE site, the AUC and AUPRC for glaucoma (0.93, 0.74), DR (0.98, 0.77), and AMD (0.96, 0.64) were slightly worse than the primary site but still quite high. However, for all 3 conditions, model calibration was worse at the second site.
Leveraging machine learning, we developed EPI models to accurately identify most patients with glaucoma, DR, and AMD in real-world datasets. The EPI models significantly outperform ICD-only models in identifying patients confirmed to have these conditions. These findings underscore the potential of using comprehensive EHR data combined with advanced machine learning techniques to improve the accuracy of patient phenotype identification, leading to better patient management and clinical outcomes.
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
对于使用真实世界数据的研究而言,准确识别具有感兴趣表型的患者具有挑战性。为了识别感兴趣的队列,大多数研究仅使用国际疾病分类(ICD)计费代码,这可能存在局限性。我们开发了一种方法,可利用电子健康记录(EHR)数据准确识别3种常见眼病(糖尿病性视网膜病变[DR]、年龄相关性黄斑变性[AMD]和青光眼)的存在与否。
数据库研究。
来自2个视力结果研究协作组(SOURCE)眼科数据存储库站点的1957名患者的3914只眼睛。
我们开发了增强型表型识别(EPI)算法,该算法在EHR字段中搜索,包括眼部检查结果、医嘱、费用、药物处方和手术数据,以寻找患者患有青光眼、DR或AMD的证据。我们使用眼科医生对EHR的金标准评估来训练我们的EPI模型,以确定这些疾病的存在与否,将我们的EPI模型的性能与仅使用ICD代码开发的模型进行比较,并使用来自另一个SOURCE站点的数据验证模型的性能。
受试者工作特征曲线下面积(AUC)、精确召回率曲线下面积(AUPRC)和模型校准。
我们的EPI模型在青光眼(0.97对0.90)、DR(0.997对0.98)和AMD(0.99对0.95)方面的AUC优于仅使用ICD的模型。我们的EPI模型在青光眼(0.79对0.32)、DR(0.96对0.84)和AMD(0.74对0.55)方面的AUPRC也远优于仅使用ICD的模型。在对来自第二个SOURCE站点的患者进行测试时,青光眼(0.93,0.74)、DR(0.98,0.77)和AMD(0.96,0.64)的AUC和AUPRC略低于主要站点,但仍然相当高。然而,对于所有3种疾病,模型校准在第二个站点更差。
利用机器学习,我们开发了EPI模型,以准确识别真实世界数据集中大多数患有青光眼、DR和AMD的患者。在识别确诊患有这些疾病的患者方面,EPI模型显著优于仅使用ICD的模型。这些发现强调了使用综合EHR数据结合先进机器学习技术来提高患者表型识别准确性的潜力,从而实现更好的患者管理和临床结局。
在本文末尾的脚注和披露中可能会发现专有或商业披露信息。