使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.

机构信息

National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.

出版信息

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

DOI:10.1093/jamia/ocaa136

PMID:33029614

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7566472/

Abstract

OBJECTIVE

The study sought to explore the use of deep learning techniques to measure the semantic relatedness between Unified Medical Language System (UMLS) concepts.

MATERIALS AND METHODS

Concept sentence embeddings were generated for UMLS concepts by applying the word embedding models BioWordVec and various flavors of BERT to concept sentences formed by concatenating UMLS terms. Graph embeddings were generated by the graph convolutional networks and 4 knowledge graph embedding models, using graphs built from UMLS hierarchical relations. Semantic relatedness was measured by the cosine between the concepts' embedding vectors. Performance was compared with 2 traditional path-based (shortest path and Leacock-Chodorow) measurements and the publicly available concept embeddings, cui2vec, generated from large biomedical corpora. The concept sentence embeddings were also evaluated on a word sense disambiguation (WSD) task. Reference standards used included the semantic relatedness and semantic similarity datasets from the University of Minnesota, concept pairs generated from the Standardized MedDRA Queries and the MeSH (Medical Subject Headings) WSD corpus.

RESULTS

Sentence embeddings generated by BioWordVec outperformed all other methods used individually in semantic relatedness measurements. Graph convolutional network graph embedding uniformly outperformed path-based measurements and was better than some word embeddings for the Standardized MedDRA Queries dataset. When used together, combined word and graph embedding achieved the best performance in all datasets. For WSD, the enhanced versions of BERT outperformed BioWordVec.

CONCLUSIONS

Word and graph embedding techniques can be used to harness terms and relations in the UMLS to measure semantic relatedness between concepts. Concept sentence embedding outperforms path-based measurements and cui2vec, and can be further enhanced by combining with graph embedding.

摘要

目的

本研究旨在探讨利用深度学习技术来衡量统一医学语言系统（UMLS）概念之间的语义相关性。

材料与方法

通过将词嵌入模型 BioWordVec 和各种 BERT 变体应用于由 UMLS 术语串联而成的概念句子，为 UMLS 概念生成概念句子嵌入。通过图卷积网络和 4 种知识图嵌入模型生成图嵌入，使用基于 UMLS 层次关系构建的图。通过概念向量之间的余弦来衡量语义相关性。将性能与 2 种传统的基于路径（最短路径和 Leacock-Chodorow）的测量方法以及从大型生物医学语料库生成的公开可用的概念嵌入 cui2vec 进行比较。概念句子嵌入还在词义消歧（WSD）任务上进行了评估。使用的参考标准包括明尼苏达大学的语义相关性和语义相似性数据集、从标准 MedDRA 查询和 MeSH（医学主题词）WSD 语料库生成的概念对。

结果

BioWordVec 生成的句子嵌入在语义相关性测量方面优于单独使用的所有其他方法。图卷积网络图嵌入在所有路径测量方法中表现一致，并且优于某些单词嵌入方法，适用于 Standardized MedDRA Queries 数据集。当联合使用时，组合的单词和图形嵌入在所有数据集上都实现了最佳性能。对于 WSD，增强版的 BERT 优于 BioWordVec。

结论

词和图嵌入技术可用于利用 UMLS 中的术语和关系来衡量概念之间的语义相关性。概念句子嵌入优于基于路径的测量方法和 cui2vec，并且通过与图嵌入相结合可以进一步增强。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a7/7566472/fb3e68eb0264/ocaa136f1.jpg

相似文献

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

Concept embedding to measure semantic relatedness for biomedical information ontologies.概念嵌入用于测量生物医学信息本体的语义相似度。

J Biomed Inform. 2019 Jun;94:103182. doi: 10.1016/j.jbi.2019.103182. Epub 2019 Apr 19.

Collocation analysis for UMLS knowledge-based word sense disambiguation.基于 UMLS 的词汇搭配分析在词义消歧中的应用。

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-12-S3-S4.

Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text.评估语义相似性和关联性的度量标准，以消除生物医学文本中的术语歧义。

J Biomed Inform. 2013 Dec;46(6):1116-24. doi: 10.1016/j.jbi.2013.08.008. Epub 2013 Sep 4.

Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型（MORE）：一种基于混合多本体和语料库的生物医学概念语义表示模型。

J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.

Improved biomedical word embeddings in the transformer era.Transformer 时代改进的生物医学词向量。

J Biomed Inform. 2021 Aug;120:103867. doi: 10.1016/j.jbi.2021.103867. Epub 2021 Jul 18.

CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.知识注入的跨语言医学术语嵌入用于术语归一化。

J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.基于知识的生物医学词义消歧：评估及在临床文档分类中的应用。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):882-6. doi: 10.1136/amiajnl-2012-001350. Epub 2012 Oct 16.

A vector-based semantic relatedness measure using multiple relations within SNOMED CT and UMLS.基于向量的语义关联度量方法，利用 SNOMED CT 和 UMLS 中的多种关系。

J Biomed Inform. 2022 Jul;131:104118. doi: 10.1016/j.jbi.2022.104118. Epub 2022 Jun 9.

引用本文的文献

NeighBERT: Medical Entity Linking Using Relation-Induced Dense Retrieval.NeighBERT：使用关系诱导密集检索的医学实体链接

J Healthc Inform Res. 2024 Jan 18;8(2):353-369. doi: 10.1007/s41666-023-00136-3. eCollection 2024 Jun.

BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights.BioLORD-2023：融合大型语言模型和临床知识图谱洞察的语义文本表示。

J Am Med Inform Assoc. 2024 Sep 1;31(9):1844-1855. doi: 10.1093/jamia/ocae029.

An interpretable machine learning framework for opioid overdose surveillance from emergency medical services records.一种可解释的机器学习框架，用于从急救医疗服务记录中监测阿片类药物过量。

PLoS One. 2024 Jan 30;19(1):e0292170. doi: 10.1371/journal.pone.0292170. eCollection 2024.

From Data to Wisdom: Biomedical Knowledge Graphs for Real-World Data Insights.从数据到智慧：用于真实世界数据洞察的生物医学知识图谱。

J Med Syst. 2023 May 17;47(1):65. doi: 10.1007/s10916-023-01951-2.

Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus.用于在统一医学语言系统元词表中大规模对齐生物医学词汇的上下文丰富学习模型。

Proc Int World Wide Web Conf. 2022 Apr;2022:1037-1046. doi: 10.1145/3485447.3511946. Epub 2022 Apr 25.

Empirical Analysis of Early Childhood Enlightenment Education Using Neural Network.神经网络在幼儿启蒙教育中的实证分析。

Comput Intell Neurosci. 2022 Aug 29;2022:3601941. doi: 10.1155/2022/3601941. eCollection 2022.

Medical terminology-based computing system: a lightweight post-processing solution for out-of-vocabulary multi-word terms.基于医学术语的计算系统：针对词汇表外多词术语的轻量级后处理解决方案。

Front Mol Biosci. 2022 Aug 12;9:928530. doi: 10.3389/fmolb.2022.928530. eCollection 2022.

An intelligent prediagnosis system for disease prediction and examination recommendation based on electronic medical record and a medical-semantic-aware convolution neural network (MSCNN) for pediatric chronic cough.一种基于电子病历的疾病预测与检查推荐智能预诊断系统以及用于小儿慢性咳嗽的医学语义感知卷积神经网络（MSCNN）。

Transl Pediatr. 2022 Jul;11(7):1216-1233. doi: 10.21037/tp-22-275.

Deciphering the Diversity of Mental Models in Neurodevelopmental Disorders: Knowledge Graph Representation of Public Data Using Natural Language Processing.解读神经发育障碍中心理模型的多样性：使用自然语言处理对公共数据进行知识图谱表示。

J Med Internet Res. 2022 Aug 5;24(8):e39888. doi: 10.2196/39888.

Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus.统一医学语言系统（UMLS）元词表中的大规模生物医学词汇对齐

Proc Int World Wide Web Conf. 2021 Apr;2021:2672-2683. doi: 10.1145/3442381.3450128. Epub 2021 Apr 19.

本文引用的文献

Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.从海量多模态医学数据中学习的临床概念嵌入。

Pac Symp Biocomput. 2020;25:295-306.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

BioWordVec, improving biomedical word embeddings with subword information and MeSH.BioWordVec，利用子词信息和 MeSH 改进生物医学词向量。

Sci Data. 2019 May 10;6(1):52. doi: 10.1038/s41597-019-0055-0.

Concept embedding to measure semantic relatedness for biomedical information ontologies.概念嵌入用于测量生物医学信息本体的语义相似度。

J Biomed Inform. 2019 Jun;94:103182. doi: 10.1016/j.jbi.2019.103182. Epub 2019 Apr 19.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Retrofitting Concept Vector Representations of Medical Concepts to Improve Estimates of Semantic Similarity and Relatedness.改造医学概念的向量表示以改进语义相似性和相关性的估计。

Stud Health Technol Inform. 2017;245:657-661.

Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。

Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.

Computing semantic similarity between biomedical concepts using new information content approach.使用新的信息内容方法计算生物医学概念之间的语义相似性。

J Biomed Inform. 2016 Feb;59:258-75. doi: 10.1016/j.jbi.2015.12.007. Epub 2015 Dec 17.

Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text.评估语义相似性和关联性的度量标准，以消除生物医学文本中的术语歧义。

J Biomed Inform. 2013 Dec;46(6):1116-24. doi: 10.1016/j.jbi.2013.08.008. Epub 2013 Sep 4.

Evaluating semantic relatedness and similarity measures with Standardized MedDRA Queries.使用标准化医学术语词典（Standardized MedDRA）查询评估语义相关性和相似性度量

AMIA Annu Symp Proc. 2012;2012:43-50. Epub 2012 Nov 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

目的

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献