Fort Daniel, Wilcox Adam B, Weng Chunhua
Department of Biomedical Informatics, Columbia University, New York City, NY.
Intermountain Healthcare, Salt Lake City, UT.
AMIA Annu Symp Proc. 2014 Nov 14;2014:1738-47. eCollection 2014.
Electronic health records (EHRs) have been used as a valuable data source for phenotyping. However, this method suffers from inherent data quality issues like data missingness. As patient self-reported health data are increasingly available, it is useful to know how the two data sources compare with each other for phenotyping. This study addresses this research question. We used self-reported diabetes status for 2,249 patients treated at Columbia University Medical Center and the well-known eMERGE EHR phenotyping algorithm for Type 2 diabetes mellitus (DM2) to conduct the experiment. The eMERGE algorithm achieved high specificity (.97) but low sensitivity (.32) among this patient cohort. About 87% of the patients with self-reported diabetes had at least one ICD-9 code, one medication, or one lab result supporting a DM2 diagnosis, implying the remaining 13% may have missing or incorrect self-reports. We discuss the tradeoffs in both data sources and in combining them for phenotyping.
电子健康记录(EHRs)已被用作表型分析的重要数据源。然而,这种方法存在固有的数据质量问题,如数据缺失。随着患者自我报告的健康数据越来越多,了解这两种数据源在表型分析方面如何相互比较是很有用的。本研究解决了这个研究问题。我们使用了哥伦比亚大学医学中心治疗的2249名患者的自我报告糖尿病状态,以及著名的2型糖尿病(DM2)的eMERGE电子健康记录表型分析算法来进行实验。在这个患者队列中,eMERGE算法具有较高的特异性(0.97),但敏感性较低(0.32)。约87%自我报告患有糖尿病的患者至少有一个支持DM2诊断的ICD-9编码、一种药物或一项实验室检查结果,这意味着其余13%的患者可能存在自我报告缺失或错误的情况。我们讨论了两种数据源以及将它们结合用于表型分析的权衡。