• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

tESA:一种用于计算语义相关性的分布度量。

tESA: a distributional measure for calculating semantic relatedness.

作者信息

Rybinski Maciej, Aldana-Montes José Francisco

机构信息

Departamento LCC, University of Malaga, Campus Teatinos, Malaga, 29010, Spain.

出版信息

J Biomed Semantics. 2016 Dec 28;7(1):67. doi: 10.1186/s13326-016-0109-6.

DOI:10.1186/s13326-016-0109-6
PMID:28031037
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5192592/
Abstract

BACKGROUND

Semantic relatedness is a measure that quantifies the strength of a semantic link between two concepts. Often, it can be efficiently approximated with methods that operate on words, which represent these concepts. Approximating semantic relatedness between texts and concepts represented by these texts is an important part of many text and knowledge processing tasks of crucial importance in the ever growing domain of biomedical informatics. The problem of most state-of-the-art methods for calculating semantic relatedness is their dependence on highly specialized, structured knowledge resources, which makes these methods poorly adaptable for many usage scenarios. On the other hand, the domain knowledge in the Life Sciences has become more and more accessible, but mostly in its unstructured form - as texts in large document collections, which makes its use more challenging for automated processing. In this paper we present tESA, an extension to a well known Explicit Semantic Relatedness (ESA) method.

RESULTS

In our extension we use two separate sets of vectors, corresponding to different sections of the articles from the underlying corpus of documents, as opposed to the original method, which only uses a single vector space. We present an evaluation of Life Sciences domain-focused applicability of both tESA and domain-adapted Explicit Semantic Analysis. The methods are tested against a set of standard benchmarks established for the evaluation of biomedical semantic relatedness quality. Our experiments show that the propsed method achieves results comparable with or superior to the current state-of-the-art methods. Additionally, a comparative discussion of the results obtained with tESA and ESA is presented, together with a study of the adaptability of the methods to different corpora and their performance with different input parameters.

CONCLUSIONS

Our findings suggest that combined use of the semantics from different sections (i.e. extending the original ESA methodology with the use of title vectors) of the documents of scientific corpora may be used to enhance the performance of a distributional semantic relatedness measures, which can be observed in the largest reference datasets. We also present the impact of the proposed extension on the size of distributional representations.

摘要

背景

语义相关性是一种量化两个概念之间语义联系强度的度量。通常,可以通过对表示这些概念的词进行操作的方法来有效地近似它。近似文本与这些文本所表示的概念之间的语义相关性是生物医学信息学不断发展领域中许多文本和知识处理任务的重要组成部分。大多数用于计算语义相关性的最先进方法的问题在于它们依赖于高度专业化的结构化知识资源,这使得这些方法在许多使用场景中适应性较差。另一方面,生命科学领域的知识越来越容易获取,但大多是以非结构化形式——如大型文档集合中的文本,这使得其在自动化处理中的使用更具挑战性。在本文中,我们提出了tESA,它是对一种著名的显式语义相关性(ESA)方法的扩展。

结果

在我们的扩展中,我们使用了两组单独的向量,分别对应于基础文档语料库中文章的不同部分,而原始方法只使用单个向量空间。我们对tESA和领域适应的显式语义分析在生命科学领域的适用性进行了评估。这些方法针对为评估生物医学语义相关性质量而建立的一组标准基准进行了测试。我们的实验表明,所提出的方法取得了与当前最先进方法相当或更优的结果。此外,还对tESA和ESA获得的结果进行了比较讨论,同时研究了这些方法对不同语料库的适应性及其在不同输入参数下的性能。

结论

我们的研究结果表明,结合使用科学语料库文档不同部分的语义(即通过使用标题向量扩展原始ESA方法)可用于提高分布语义相关性度量的性能,这在最大的参考数据集中可以观察到。我们还展示了所提出的扩展对分布表示大小的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f58/5192592/2c751c898e47/13326_2016_109_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f58/5192592/8454425e8bb2/13326_2016_109_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f58/5192592/e6ca4280de85/13326_2016_109_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f58/5192592/4e07b060068f/13326_2016_109_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f58/5192592/2c751c898e47/13326_2016_109_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f58/5192592/8454425e8bb2/13326_2016_109_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f58/5192592/e6ca4280de85/13326_2016_109_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f58/5192592/4e07b060068f/13326_2016_109_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f58/5192592/2c751c898e47/13326_2016_109_Fig4_HTML.jpg

相似文献

1
tESA: a distributional measure for calculating semantic relatedness.tESA:一种用于计算语义相关性的分布度量。
J Biomed Semantics. 2016 Dec 28;7(1):67. doi: 10.1186/s13326-016-0109-6.
2
Large scale biomedical texts classification: a kNN and an ESA-based approaches.大规模生物医学文本分类:基于k近邻算法和基于词嵌入语义分析的方法。
J Biomed Semantics. 2016 Jun 16;7:40. doi: 10.1186/s13326-016-0073-1.
3
Calculating semantic relatedness for biomedical use in a knowledge-poor environment.在知识匮乏的环境中计算生物医学用途的语义相关性。
BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S2. doi: 10.1186/1471-2105-15-S14-S2. Epub 2014 Nov 27.
4
Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。
Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.
5
Retrofitting Concept Vector Representations of Medical Concepts to Improve Estimates of Semantic Similarity and Relatedness.改造医学概念的向量表示以改进语义相似性和相关性的估计。
Stud Health Technol Inform. 2017;245:657-661.
6
Semantic similarity in the biomedical domain: an evaluation across knowledge sources.生物医学领域的语义相似度:跨知识源的评估。
BMC Bioinformatics. 2012 Oct 10;13:261. doi: 10.1186/1471-2105-13-261.
7
Vector representations of multi-word terms for semantic relatedness.多词术语的语义关联的向量表示。
J Biomed Inform. 2018 Jan;77:111-119. doi: 10.1016/j.jbi.2017.12.006. Epub 2017 Dec 13.
8
Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型(MORE):一种基于混合多本体和语料库的生物医学概念语义表示模型。
J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.
9
Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。
J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.
10
RysannMD: A biomedical semantic annotator balancing speed and accuracy.RysannMD:一款兼顾速度与准确性的生物医学语义注释工具。
J Biomed Inform. 2017 Jul;71:91-109. doi: 10.1016/j.jbi.2017.05.016. Epub 2017 May 26.

本文引用的文献

1
Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach.利用百科知识进行生物医学文献分类:一种基于维基百科的概念袋方法。
PeerJ. 2015 Sep 29;3:e1279. doi: 10.7717/peerj.1279. eCollection 2015.
2
Calculating semantic relatedness for biomedical use in a knowledge-poor environment.在知识匮乏的环境中计算生物医学用途的语义相关性。
BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S2. doi: 10.1186/1471-2105-15-S14-S2. Epub 2014 Nov 27.
3
The next generation of similarity measures that fully explore the semantics in biomedical ontologies.
全面探索生物医学本体中语义的下一代相似性度量。
J Bioinform Comput Biol. 2013 Oct;11(5):1371001. doi: 10.1142/S0219720013710017. Epub 2013 Jul 18.
4
Evaluating measures of redundancy in clinical texts.评估临床文本中的冗余度指标。
AMIA Annu Symp Proc. 2011;2011:1612-20. Epub 2011 Oct 22.
5
Finding disease similarity based on implicit semantic similarity.基于隐语义相似性的疾病相似性发现。
J Biomed Inform. 2012 Apr;45(2):363-71. doi: 10.1016/j.jbi.2011.11.017. Epub 2011 Dec 7.
6
Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective.生物医学领域的语义相似度评估:基于本体的信息论视角。
J Biomed Inform. 2011 Oct;44(5):749-59. doi: 10.1016/j.jbi.2011.03.013. Epub 2011 Apr 2.
7
Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study.临床术语之间的语义相似性和相关性:一项实验研究。
AMIA Annu Symp Proc. 2010 Nov 13;2010:572-6.
8
Towards a framework for developing semantic relatedness reference standards.迈向开发语义关联参照标准的框架。
J Biomed Inform. 2011 Apr;44(2):251-65. doi: 10.1016/j.jbi.2010.10.004. Epub 2010 Oct 31.
9
An ontology-based measure to compute semantic similarity in biomedicine.基于本体的生物医学语义相似度计算度量方法。
J Biomed Inform. 2011 Feb;44(1):118-25. doi: 10.1016/j.jbi.2010.09.002. Epub 2010 Sep 15.
10
Semantic similarity in biomedical ontologies.生物医学本体中的语义相似性。
PLoS Comput Biol. 2009 Jul;5(7):e1000443. doi: 10.1371/journal.pcbi.1000443. Epub 2009 Jul 31.