Suppr超能文献

使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.

机构信息

National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.

出版信息

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

Abstract

OBJECTIVE

The study sought to explore the use of deep learning techniques to measure the semantic relatedness between Unified Medical Language System (UMLS) concepts.

MATERIALS AND METHODS

Concept sentence embeddings were generated for UMLS concepts by applying the word embedding models BioWordVec and various flavors of BERT to concept sentences formed by concatenating UMLS terms. Graph embeddings were generated by the graph convolutional networks and 4 knowledge graph embedding models, using graphs built from UMLS hierarchical relations. Semantic relatedness was measured by the cosine between the concepts' embedding vectors. Performance was compared with 2 traditional path-based (shortest path and Leacock-Chodorow) measurements and the publicly available concept embeddings, cui2vec, generated from large biomedical corpora. The concept sentence embeddings were also evaluated on a word sense disambiguation (WSD) task. Reference standards used included the semantic relatedness and semantic similarity datasets from the University of Minnesota, concept pairs generated from the Standardized MedDRA Queries and the MeSH (Medical Subject Headings) WSD corpus.

RESULTS

Sentence embeddings generated by BioWordVec outperformed all other methods used individually in semantic relatedness measurements. Graph convolutional network graph embedding uniformly outperformed path-based measurements and was better than some word embeddings for the Standardized MedDRA Queries dataset. When used together, combined word and graph embedding achieved the best performance in all datasets. For WSD, the enhanced versions of BERT outperformed BioWordVec.

CONCLUSIONS

Word and graph embedding techniques can be used to harness terms and relations in the UMLS to measure semantic relatedness between concepts. Concept sentence embedding outperforms path-based measurements and cui2vec, and can be further enhanced by combining with graph embedding.

摘要

目的

本研究旨在探讨利用深度学习技术来衡量统一医学语言系统(UMLS)概念之间的语义相关性。

材料与方法

通过将词嵌入模型 BioWordVec 和各种 BERT 变体应用于由 UMLS 术语串联而成的概念句子,为 UMLS 概念生成概念句子嵌入。通过图卷积网络和 4 种知识图嵌入模型生成图嵌入,使用基于 UMLS 层次关系构建的图。通过概念向量之间的余弦来衡量语义相关性。将性能与 2 种传统的基于路径(最短路径和 Leacock-Chodorow)的测量方法以及从大型生物医学语料库生成的公开可用的概念嵌入 cui2vec 进行比较。概念句子嵌入还在词义消歧(WSD)任务上进行了评估。使用的参考标准包括明尼苏达大学的语义相关性和语义相似性数据集、从标准 MedDRA 查询和 MeSH(医学主题词)WSD 语料库生成的概念对。

结果

BioWordVec 生成的句子嵌入在语义相关性测量方面优于单独使用的所有其他方法。图卷积网络图嵌入在所有路径测量方法中表现一致,并且优于某些单词嵌入方法,适用于 Standardized MedDRA Queries 数据集。当联合使用时,组合的单词和图形嵌入在所有数据集上都实现了最佳性能。对于 WSD,增强版的 BERT 优于 BioWordVec。

结论

词和图嵌入技术可用于利用 UMLS 中的术语和关系来衡量概念之间的语义相关性。概念句子嵌入优于基于路径的测量方法和 cui2vec,并且通过与图嵌入相结合可以进一步增强。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a7/7566472/fb3e68eb0264/ocaa136f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验