Chakrabarti Shreya, Sen Anando, Huser Vojtech, Hruby Gregory W, Rusanov Alexander, Albers David J, Weng Chunhua
Department of Biomedical Informatics, Columbia University, New York NY 10032.
National Institute of Health, National Library of Medicine, Bethesda, MD 20892.
J Healthc Inform Res. 2017 Jun;1(1):1-18. doi: 10.1007/s41666-017-0005-6. Epub 2017 Jun 8.
Cohort identification for clinical studies tends to be laborious, time-consuming, and expensive. Developing automated or semi-automated methods for cohort identification is one of the "holy grails" in the field of biomedical informatics. We propose a high-throughput similarity-based cohort identification algorithm by applying numerical abstractions on Electronic Health Records (EHR) data. We implement this algorithm using the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), which enables sites using this standardized EHR data representation to avail this algorithm with minimum effort for local implementation. We validate its performance for a retrospective cohort identification task on six clinical trials conducted at the Columbia University Medical Center. Our algorithm achieves an average Area Under the Curve (AUC) of 0.966 and an average Precision at 5 of 0.983. This interoperable method promises to achieve efficient cohort identification in EHR databases. We discuss suitable applications of our method and its limitations and propose warranted future work.
临床研究中的队列识别往往既费力、耗时又昂贵。开发用于队列识别的自动化或半自动化方法是生物医学信息学领域的“圣杯”之一。我们通过对电子健康记录(EHR)数据应用数值抽象,提出了一种基于相似性的高通量队列识别算法。我们使用观察性医疗结果合作组织(OMOP)通用数据模型(CDM)来实现此算法,这使得使用这种标准化EHR数据表示的机构能够以最小的努力在本地实施此算法。我们在哥伦比亚大学医学中心进行的六项临床试验中,针对回顾性队列识别任务验证了其性能。我们的算法平均曲线下面积(AUC)为0.966,在5时的平均精确率为0.983。这种可互操作的方法有望在EHR数据库中实现高效的队列识别。我们讨论了我们方法的合适应用及其局限性,并提出了未来有必要开展的工作。