Yu Zhiguo, Wallace Byron C, Johnson Todd, Cohen Trevor
The University of Texas School of Biomedical Informatics at Houston, Houston, Texas, USA.
College of Computer and Information Science, Northeastern University, Boston, Massachusetts, USA.
Stud Health Technol Inform. 2017;245:657-661.
Estimation of semantic similarity and relatedness between biomedical concepts has utility for many informatics applications. Automated methods fall into two categories: methods based on distributional statistics drawn from text corpora, and methods using the structure of existing knowledge resources. Methods in the former category disregard taxonomic structure, while those in the latter fail to consider semantically relevant empirical information. In this paper, we present a method that retrofits distributional context vector representations of biomedical concepts using structural information from the UMLS Metathesaurus, such that the similarity between vector representations of linked concepts is augmented. We evaluated it on the UMNSRS benchmark. Our results demonstrate that retrofitting of concept vector representations leads to better correlation with human raters for both similarity and relatedness, surpassing the best results reported to date. They also demonstrate a clear improvement in performance on this reference standard for retrofitted vector representations, as compared to those without retrofitting.
生物医学概念之间语义相似性和相关性的评估对许多信息学应用都有用处。自动化方法可分为两类:基于从文本语料库中提取的分布统计信息的方法,以及使用现有知识资源结构的方法。前一类方法忽略分类结构,而后一类方法则未能考虑语义相关的经验信息。在本文中,我们提出了一种方法,该方法利用来自UMLS元词表的结构信息对生物医学概念的分布上下文向量表示进行改造,从而增强链接概念的向量表示之间的相似性。我们在UMNSRS基准上对其进行了评估。我们的结果表明,概念向量表示的改造在相似性和相关性方面都能与人类评分者有更好的相关性,超过了迄今为止报道的最佳结果。与未改造的向量表示相比,它们还表明改造后的向量表示在该参考标准上的性能有明显提高。