Suppr超能文献

改造医学概念的向量表示以改进语义相似性和相关性的估计。

Retrofitting Concept Vector Representations of Medical Concepts to Improve Estimates of Semantic Similarity and Relatedness.

作者信息

Yu Zhiguo, Wallace Byron C, Johnson Todd, Cohen Trevor

机构信息

The University of Texas School of Biomedical Informatics at Houston, Houston, Texas, USA.

College of Computer and Information Science, Northeastern University, Boston, Massachusetts, USA.

出版信息

Stud Health Technol Inform. 2017;245:657-661.

Abstract

Estimation of semantic similarity and relatedness between biomedical concepts has utility for many informatics applications. Automated methods fall into two categories: methods based on distributional statistics drawn from text corpora, and methods using the structure of existing knowledge resources. Methods in the former category disregard taxonomic structure, while those in the latter fail to consider semantically relevant empirical information. In this paper, we present a method that retrofits distributional context vector representations of biomedical concepts using structural information from the UMLS Metathesaurus, such that the similarity between vector representations of linked concepts is augmented. We evaluated it on the UMNSRS benchmark. Our results demonstrate that retrofitting of concept vector representations leads to better correlation with human raters for both similarity and relatedness, surpassing the best results reported to date. They also demonstrate a clear improvement in performance on this reference standard for retrofitted vector representations, as compared to those without retrofitting.

摘要

生物医学概念之间语义相似性和相关性的评估对许多信息学应用都有用处。自动化方法可分为两类:基于从文本语料库中提取的分布统计信息的方法,以及使用现有知识资源结构的方法。前一类方法忽略分类结构,而后一类方法则未能考虑语义相关的经验信息。在本文中,我们提出了一种方法,该方法利用来自UMLS元词表的结构信息对生物医学概念的分布上下文向量表示进行改造,从而增强链接概念的向量表示之间的相似性。我们在UMNSRS基准上对其进行了评估。我们的结果表明,概念向量表示的改造在相似性和相关性方面都能与人类评分者有更好的相关性,超过了迄今为止报道的最佳结果。与未改造的向量表示相比,它们还表明改造后的向量表示在该参考标准上的性能有明显提高。

相似文献

4
Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。
Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.

引用本文的文献

1
Improved biomedical word embeddings in the transformer era.Transformer 时代改进的生物医学词向量。
J Biomed Inform. 2021 Aug;120:103867. doi: 10.1016/j.jbi.2021.103867. Epub 2021 Jul 18.
4
Better synonyms for enriching biomedical search.更好的生物医学搜索丰富化的同义词。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1894-1902. doi: 10.1093/jamia/ocaa151.

本文引用的文献

1
Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。
Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.
4
Towards a framework for developing semantic relatedness reference standards.迈向开发语义关联参照标准的框架。
J Biomed Inform. 2011 Apr;44(2):251-65. doi: 10.1016/j.jbi.2010.10.004. Epub 2010 Oct 31.
6
Empirical distributional semantics: methods and biomedical applications.实证分布语义学:方法与生物医学应用
J Biomed Inform. 2009 Apr;42(2):390-405. doi: 10.1016/j.jbi.2009.02.002. Epub 2009 Feb 14.
7
A document clustering and ranking system for exploring MEDLINE citations.一种用于探索MEDLINE引文的文档聚类和排序系统。
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):651-61. doi: 10.1197/jamia.M2215. Epub 2007 Jun 28.
8
Measures of semantic similarity and relatedness in the biomedical domain.生物医学领域中语义相似性和相关性的度量。
J Biomed Inform. 2007 Jun;40(3):288-99. doi: 10.1016/j.jbi.2006.06.004. Epub 2006 Jun 10.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验