医学概念嵌入在表型分析中进行特征工程的比较有效性。

Comparative effectiveness of medical concept embedding for feature engineering in phenotyping.

作者信息

Lee Junghwan, Liu Cong, Kim Jae Hyun, Butler Alex, Shang Ning, Pang Chao, Natarajan Karthik, Ryan Patrick, Ta Casey, Weng Chunhua

机构信息

Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York 10032, USA.

出版信息

JAMIA Open. 2021 Jun 16;4(2):ooab028. doi: 10.1093/jamiaopen/ooab028. eCollection 2021 Apr.

DOI:10.1093/jamiaopen/ooab028

PMID:34142015

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8206403/

Abstract

OBJECTIVE

Feature engineering is a major bottleneck in phenotyping. Properly learned medical concept embeddings (MCEs) capture the semantics of medical concepts, thus are useful for retrieving relevant medical features in phenotyping tasks. We compared the effectiveness of MCEs learned from knowledge graphs and electronic healthcare records (EHR) data in retrieving relevant medical features for phenotyping tasks.

MATERIALS AND METHODS

We implemented 5 embedding methods including node2vec, singular value decomposition (SVD), LINE, skip-gram, and GloVe with 2 data sources: (1) knowledge graphs obtained from the observational medical outcomes partnership (OMOP) common data model; and (2) patient-level data obtained from the OMOP compatible electronic health records (EHR) from Columbia University Irving Medical Center (CUIMC). We used phenotypes with their relevant concepts developed and validated by the electronic medical records and genomics (eMERGE) network to evaluate the performance of learned MCEs in retrieving phenotype-relevant concepts. in retrieving phenotype-relevant concepts based on a single and multiple seed concept(s) was used to evaluate MCEs.

RESULTS

Among all MCEs, MCEs learned by using node2vec with knowledge graphs showed the best performance. Of MCEs based on knowledge graphs and EHR data, MCEs learned by using node2vec with knowledge graphs and MCEs learned by using GloVe with EHR data outperforms other MCEs, respectively.

CONCLUSION

MCE enables scalable feature engineering tasks, thereby facilitating phenotyping. Based on current phenotyping practices, MCEs learned by using knowledge graphs constructed by hierarchical relationships among medical concepts outperformed MCEs learned by using EHR data.

摘要

目的

特征工程是表型分析中的一个主要瓶颈。正确学习的医学概念嵌入（MCE）能够捕捉医学概念的语义，因此在表型分析任务中检索相关医学特征时很有用。我们比较了从知识图谱和电子健康记录（EHR）数据中学习到的MCE在检索表型分析任务相关医学特征方面的有效性。

材料与方法

我们使用2个数据源实现了5种嵌入方法，包括node2vec、奇异值分解（SVD）、LINE、skip-gram和GloVe：（1）从观察性医学结局合作组织（OMOP）通用数据模型获得的知识图谱；以及（2）从哥伦比亚大学欧文医学中心（CUIMC）的OMOP兼容电子健康记录（EHR）中获得的患者级数据。我们使用由电子病历与基因组学（eMERGE）网络开发和验证的表型及其相关概念，来评估学习到的MCE在检索与表型相关概念方面的性能。基于单个和多个种子概念检索与表型相关概念被用于评估MCE。

结果

在所有MCE中，使用node2vec从知识图谱学习到的MCE表现最佳。在基于知识图谱和EHR数据的MCE中，使用node2vec从知识图谱学习到的MCE和使用GloVe从EHR数据学习到的MCE分别优于其他MCE。

结论

MCE能够实现可扩展的特征工程任务，从而促进表型分析。基于当前的表型分析实践，通过使用由医学概念之间的层次关系构建的知识图谱学习到的MCE优于通过使用EHR数据学习到的MCE。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/039c/8206403/9c226ea6d7a3/ooab028f1.jpg

相似文献

Comparative effectiveness of medical concept embedding for feature engineering in phenotyping.医学概念嵌入在表型分析中进行特征工程的比较有效性。

JAMIA Open. 2021 Jun 16;4(2):ooab028. doi: 10.1093/jamiaopen/ooab028. eCollection 2021 Apr.

ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis.ARCH：通过聚合叙事编码健康记录分析构建大规模知识图谱

medRxiv. 2023 May 21:2023.05.14.23289955. doi: 10.1101/2023.05.14.23289955.

HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.HPO2Vec+：利用异构知识资源丰富人类表型本体的节点嵌入。

J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.

Feature extraction for phenotyping from semantic and knowledge resources.从语义和知识资源中进行表型特征提取。

J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.

Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization.多视图不完整知识图集成及其在跨机构电子健康记录数据协调中的应用。

J Biomed Inform. 2022 Sep;133:104147. doi: 10.1016/j.jbi.2022.104147. Epub 2022 Jul 21.

ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.ARCH：通过汇总叙述性编码健康记录分析构建大规模知识图谱

J Biomed Inform. 2025 Feb;162:104761. doi: 10.1016/j.jbi.2024.104761. Epub 2025 Jan 23.

DOME: Directional medical embedding vectors from Electronic Health Records.DOME：来自电子健康记录的定向医学嵌入向量。

J Biomed Inform. 2025 Feb;162:104768. doi: 10.1016/j.jbi.2024.104768. Epub 2025 Jan 2.

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study.基于细粒度语义信息模型构建传染病高保真表型知识图谱：开发与可用性研究。

J Med Internet Res. 2021 Jun 15;23(6):e26892. doi: 10.2196/26892.

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.通过将医疗概念和词汇联合嵌入到统一的向量空间中进行 EHR 表型分析。

BMC Med Inform Decis Mak. 2018 Dec 12;18(Suppl 4):123. doi: 10.1186/s12911-018-0672-0.

引用本文的文献

Taking a look at your speech: identifying diagnostic status and negative symptoms of psychosis using convolutional neural networks.审视你的言语：使用卷积神经网络识别精神病的诊断状态和阴性症状。

NPP Digit Psychiatry Neurosci. 2025;3(1):19. doi: 10.1038/s44277-025-00040-1. Epub 2025 Jul 8.

Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review.将自然语言处理应用于临床数据仓库中的文本数据：系统评价。

JMIR Med Inform. 2023 Dec 15;11:e42477. doi: 10.2196/42477.

SymptomGraph: Identifying Symptom Clusters from Narrative Clinical Notes using Graph Clustering.症状图：使用图聚类从叙述性临床记录中识别症状簇。

Proc Symp Appl Comput. 2023 Mar;2023:518-527. doi: 10.1145/3555776.3577685. Epub 2023 Jun 7.

Phenotyping in distributed data networks: selecting the right codes for the right patients.分布式数据网络中的表型分析：为合适的患者选择合适的编码。

AMIA Annu Symp Proc. 2023 Apr 29;2022:826-835. eCollection 2022.

FAIRification of health-related data using semantic web technologies in the Swiss Personalized Health Network.利用语义网技术在瑞士个性化健康网络中实现健康相关数据的 FAIR 化。

Sci Data. 2023 Mar 10;10(1):127. doi: 10.1038/s41597-023-02028-y.

OARD: Open annotations for rare diseases and their phenotypes based on real-world data.基于真实世界数据的罕见病及其表型的开放注释

Am J Hum Genet. 2022 Sep 1;109(9):1591-1604. doi: 10.1016/j.ajhg.2022.08.002. Epub 2022 Aug 22.

Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records.Phe2vec：基于电子健康记录的无监督嵌入进行自动疾病表型分析。

Patterns (N Y). 2021 Sep 2;2(9):100337. doi: 10.1016/j.patter.2021.100337. eCollection 2021 Sep 10.

Severity Prediction for COVID-19 Patients via Recurrent Neural Networks.基于循环神经网络的 COVID-19 患者严重程度预测。

AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:374-383. eCollection 2021.

Severity Prediction for COVID-19 Patients via Recurrent Neural Networks.基于循环神经网络的新型冠状病毒肺炎患者严重程度预测

medRxiv. 2021 Jan 21:2020.08.28.20184200. doi: 10.1101/2020.08.28.20184200.

本文引用的文献

GRAM: Graph-based Attention Model for Healthcare Representation Learning.GRAM：用于医疗保健表示学习的基于图的注意力模型。

KDD. 2017 Aug;2017:787-795. doi: 10.1145/3097983.3098126.

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).使用一种常见的半监督方法（PheCAP）对电子病历数据进行高通量表型分析。

Nat Protoc. 2019 Dec;14(12):3426-3444. doi: 10.1038/s41596-019-0227-6. Epub 2019 Nov 20.

High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.高通量多模态自动化表型分析 (MAP) 在 pheWAS 中的应用。

J Am Med Inform Assoc. 2019 Nov 1;26(11):1255-1262. doi: 10.1093/jamia/ocz066.

Graph embedding on biomedical networks: methods, applications and evaluations.生物医学网络上的图嵌入：方法、应用和评估。

Bioinformatics. 2020 Feb 15;36(4):1241-1251. doi: 10.1093/bioinformatics/btz718.

Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network.实现电子表型的工作可视化：从 eMERGE 网络中获得的经验教训。

J Biomed Inform. 2019 Nov;99:103293. doi: 10.1016/j.jbi.2019.103293. Epub 2019 Sep 19.

Detecting Systemic Data Quality Issues in Electronic Health Records.检测电子健康记录中的系统性数据质量问题。

Stud Health Technol Inform. 2019 Aug 21;264:383-387. doi: 10.3233/SHTI190248.

Facilitating phenotype transfer using a common data model.利用通用数据模型促进表型转移。

J Biomed Inform. 2019 Aug;96:103253. doi: 10.1016/j.jbi.2019.103253. Epub 2019 Jul 17.

HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.HPO2Vec+：利用异构知识资源丰富人类表型本体的节点嵌入。

J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.

Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models.电子表型分析的进展：从基于规则的定义到机器学习模型

Annu Rev Biomed Data Sci. 2018 Jul;1:53-68. doi: 10.1146/annurev-biodatasci-080917-013315. Epub 2018 May 23.

Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records.哥伦比亚开放健康数据，来自电子健康记录的临床概念流行率和共同出现。

Sci Data. 2018 Nov 27;5:180273. doi: 10.1038/sdata.2018.273.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

医学概念嵌入在表型分析中进行特征工程的比较有效性。

Comparative effectiveness of medical concept embedding for feature engineering in phenotyping.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献