Lee Junghwan, Liu Cong, Kim Jae Hyun, Butler Alex, Shang Ning, Pang Chao, Natarajan Karthik, Ryan Patrick, Ta Casey, Weng Chunhua
Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York 10032, USA.
JAMIA Open. 2021 Jun 16;4(2):ooab028. doi: 10.1093/jamiaopen/ooab028. eCollection 2021 Apr.
OBJECTIVE: Feature engineering is a major bottleneck in phenotyping. Properly learned medical concept embeddings (MCEs) capture the semantics of medical concepts, thus are useful for retrieving relevant medical features in phenotyping tasks. We compared the effectiveness of MCEs learned from knowledge graphs and electronic healthcare records (EHR) data in retrieving relevant medical features for phenotyping tasks. MATERIALS AND METHODS: We implemented 5 embedding methods including node2vec, singular value decomposition (SVD), LINE, skip-gram, and GloVe with 2 data sources: (1) knowledge graphs obtained from the observational medical outcomes partnership (OMOP) common data model; and (2) patient-level data obtained from the OMOP compatible electronic health records (EHR) from Columbia University Irving Medical Center (CUIMC). We used phenotypes with their relevant concepts developed and validated by the electronic medical records and genomics (eMERGE) network to evaluate the performance of learned MCEs in retrieving phenotype-relevant concepts. in retrieving phenotype-relevant concepts based on a single and multiple seed concept(s) was used to evaluate MCEs. RESULTS: Among all MCEs, MCEs learned by using node2vec with knowledge graphs showed the best performance. Of MCEs based on knowledge graphs and EHR data, MCEs learned by using node2vec with knowledge graphs and MCEs learned by using GloVe with EHR data outperforms other MCEs, respectively. CONCLUSION: MCE enables scalable feature engineering tasks, thereby facilitating phenotyping. Based on current phenotyping practices, MCEs learned by using knowledge graphs constructed by hierarchical relationships among medical concepts outperformed MCEs learned by using EHR data.
J Biomed Inform. 2019-2-7
J Biomed Inform. 2025-2
J Biomed Inform. 2025-2
J Am Med Inform Assoc. 2020-10-1
BMC Med Inform Decis Mak. 2018-12-12
NPP Digit Psychiatry Neurosci. 2025
JMIR Med Inform. 2023-12-15
Proc Symp Appl Comput. 2023-3
AMIA Annu Symp Proc. 2022
Am J Hum Genet. 2022-9-1
AMIA Jt Summits Transl Sci Proc. 2021
medRxiv. 2021-1-21
J Am Med Inform Assoc. 2019-11-1
Bioinformatics. 2020-2-15
J Biomed Inform. 2019-9-19
Stud Health Technol Inform. 2019-8-21
J Biomed Inform. 2019-7-17
Annu Rev Biomed Data Sci. 2018-7