De Freitas Jessica K, Johnson Kipp W, Golden Eddye, Nadkarni Girish N, Dudley Joel T, Bottinger Erwin P, Glicksberg Benjamin S, Miotto Riccardo
Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA.
Patterns (N Y). 2021 Sep 2;2(9):100337. doi: 10.1016/j.patter.2021.100337. eCollection 2021 Sep 10.
Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts.
大规模对电子健康记录(EHR)中的患者进行稳健的表型分析是临床信息学中的一项挑战。在此,我们介绍Phe2vec,这是一个基于无监督学习从电子健康记录中进行疾病表型分析的自动化框架,并评估其相对于来自表型知识库(PheKB)的标准基于规则算法的有效性。Phe2vec基于预先计算医学概念和患者临床病史的嵌入。然后从嵌入空间中的一个种子概念及其邻居推导出疾病表型。如果患者的嵌入表示接近疾病表型,则将其与一种疾病相关联。使用图表审查将Phe2vec和PheKB队列进行直接比较,在十种疾病中的九种疾病中,Phe2vec的表现相当或更好。与其他方法不同,它可以扩展到任何疾病,并已根据广泛采用的基于专家的标准进行了验证。Phe2vec旨在通过增强当前框架来优化临床信息学研究,以按疾病特征描述患者并得出可靠的疾病队列。