研究不同词义消歧方法与生物医学文本摘要有效性之间的相关性。

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts.

机构信息

Universidad Complutense de Madrid, Calle Profesor José García Santesmases s/n, 28040 Madrid, Spain.

出版信息

BMC Bioinformatics. 2011 Aug 26;12:355. doi: 10.1186/1471-2105-12-355.

DOI:10.1186/1471-2105-12-355

PMID:21871110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3176269/

Abstract

BACKGROUND

Word sense disambiguation (WSD) attempts to solve lexical ambiguities by identifying the correct meaning of a word based on its context. WSD has been demonstrated to be an important step in knowledge-based approaches to automatic summarization. However, the correlation between the accuracy of the WSD methods and the summarization performance has never been studied.

RESULTS

We present three existing knowledge-based WSD approaches and a graph-based summarizer. Both the WSD approaches and the summarizer employ the Unified Medical Language System (UMLS) Metathesaurus as the knowledge source. We first evaluate WSD directly, by comparing the prediction of the WSD methods to two reference sets: the NLM WSD dataset and the MSH WSD collection. We next apply the different WSD methods as part of the summarizer, to map documents onto concepts in the UMLS Metathesaurus, and evaluate the summaries that are generated. The results obtained by the different methods in both evaluations are studied and compared.

CONCLUSIONS

It has been found that the use of WSD techniques has a positive impact on the results of our graph-based summarizer, and that, when both the WSD and summarization tasks are assessed over large and homogeneous evaluation collections, there exists a correlation between the overall results of the WSD and summarization tasks. Furthermore, the best WSD algorithm in the first task tends to be also the best one in the second. However, we also found that the improvement achieved by the summarizer is not directly correlated with the WSD performance. The most likely reason is that the errors in disambiguation are not equally important but depend on the relative salience of the different concepts in the document to be summarized.

摘要

背景

词义消歧（WSD）试图通过根据上下文识别单词的正确含义来解决词汇歧义。WSD 已被证明是基于知识的自动摘要方法的重要步骤。然而，WSD 方法的准确性与摘要性能之间的相关性从未被研究过。

结果

我们提出了三种现有的基于知识的 WSD 方法和一种基于图的摘要器。WSD 方法和摘要器都使用统一医学语言系统 (UMLS) 术语表作为知识源。我们首先通过将 WSD 方法的预测与两个参考集（NLM WSD 数据集和 MSH WSD 集合）进行比较，直接评估 WSD。接下来，我们将不同的 WSD 方法作为摘要器的一部分应用，将文档映射到 UMLS 术语表中的概念，并评估生成的摘要。研究并比较了在这两种评估中不同方法获得的结果。

结论

已发现 WSD 技术的使用对我们基于图的摘要器的结果有积极影响，并且当 WSD 和摘要任务都在大型且同质的评估集合上进行评估时，WSD 和摘要任务的整体结果之间存在相关性。此外，在第一个任务中表现最好的 WSD 算法往往也是第二个任务中表现最好的算法。然而，我们还发现，摘要器的改进与 WSD 性能没有直接的相关性。最有可能的原因是消歧错误并不同等重要，而是取决于要总结的文档中不同概念的相对显著性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a4b/3176269/1d4e5e0d2cbc/1471-2105-12-355-1.jpg

相似文献

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts.

BMC Bioinformatics. 2011 Aug 26;12:355. doi: 10.1186/1471-2105-12-355.

Collocation analysis for UMLS knowledge-based word sense disambiguation.

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-12-S3-S4.

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation.

BMC Bioinformatics. 2011 Jun 2;12:223. doi: 10.1186/1471-2105-12-223.

Knowledge-based biomedical word sense disambiguation: comparison of approaches.

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

Determining the difficulty of Word Sense Disambiguation.

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):882-6. doi: 10.1136/amiajnl-2012-001350. Epub 2012 Oct 16.

Quantifying the informativeness for biomedical literature summarization: An itemset mining method.

Comput Methods Programs Biomed. 2017 Jul;146:77-89. doi: 10.1016/j.cmpb.2017.05.011. Epub 2017 May 27.

Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus.

J Biomed Inform. 2010 Oct;43(5):762-73. doi: 10.1016/j.jbi.2010.06.001. Epub 2010 Jun 10.

Summarization of biomedical articles using domain-specific word embeddings and graph ranking.

J Biomed Inform. 2020 Jul;107:103452. doi: 10.1016/j.jbi.2020.103452. Epub 2020 May 19.

Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods.

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):842-9. doi: 10.1136/amiajnl-2013-002133. Epub 2014 Jan 17.

引用本文的文献

Text summarization in the biomedical domain: a systematic review of recent research.

J Biomed Inform. 2014 Dec;52:457-67. doi: 10.1016/j.jbi.2014.06.009. Epub 2014 Jul 10.

MeSH indexing based on automatically generated summaries.

BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):882-6. doi: 10.1136/amiajnl-2012-001350. Epub 2012 Oct 16.

本文引用的文献

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation.

BMC Bioinformatics. 2011 Jun 2;12:223. doi: 10.1186/1471-2105-12-223.

Knowledge-based biomedical word sense disambiguation: comparison of approaches.

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

Graph-based word sense disambiguation of biomedical documents.

Bioinformatics. 2010 Nov 15;26(22):2889-96. doi: 10.1093/bioinformatics/btq555. Epub 2010 Oct 7.

An overview of MetaMap: historical perspective and recent advances.

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

Word Sense Disambiguation by Selecting the Best Semantic Type Based on Journal Descriptor Indexing: Preliminary Experiment.

J Am Soc Inf Sci Technol. 2006 Jan 1;57(1):96-113. doi: 10.1002/asi.20257.

Disambiguation of biomedical text using diverse sources of information.

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S7. doi: 10.1186/1471-2105-9-S11-S7.

Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain.

AMIA Annu Symp Proc. 2007 Oct 11;2007:533-7.

A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method.

BMC Bioinformatics. 2007 Nov 27;8 Suppl 9(Suppl 9):S4. doi: 10.1186/1471-2105-8-S9-S4.

Biomedical language processing: what's beyond PubMed?

Mol Cell. 2006 Mar 3;21(5):589-94. doi: 10.1016/j.molcel.2006.02.012.

Word sense disambiguation in the biomedical domain: an overview.

J Comput Biol. 2005 Jun;12(5):554-65. doi: 10.1089/cmb.2005.12.554.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

研究不同词义消歧方法与生物医学文本摘要有效性之间的相关性。

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献