确定词义消歧的难度。

Determining the difficulty of Word Sense Disambiguation.

作者信息

McInnes Bridget T, Stevenson Mark

机构信息

Minnesota Supercomputing Institute, University of Minnesota, 117 Pleasant St SE, Minneapolis, MN 55455, USA.

Natural Language Processing Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom.

出版信息

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

DOI:10.1016/j.jbi.2013.09.009

PMID:24076369

Abstract

Automatic processing of biomedical documents is made difficult by the fact that many of the terms they contain are ambiguous. Word Sense Disambiguation (WSD) systems attempt to resolve these ambiguities and identify the correct meaning. However, the published literature on WSD systems for biomedical documents report considerable differences in performance for different terms. The development of WSD systems is often expensive with respect to acquiring the necessary training data. It would therefore be useful to be able to predict in advance which terms WSD systems are likely to perform well or badly on. This paper explores various methods for estimating the performance of WSD systems on a wide range of ambiguous biomedical terms (including ambiguous words/phrases and abbreviations). The methods include both supervised and unsupervised approaches. The supervised approaches make use of information from labeled training data while the unsupervised ones rely on the UMLS Metathesaurus. The approaches are evaluated by comparing their predictions about how difficult disambiguation will be for ambiguous terms against the output of two WSD systems. We find the supervised methods are the best predictors of WSD difficulty, but are limited by their dependence on labeled training data. The unsupervised methods all perform well in some situations and can be applied more widely.

摘要

生物医学文献的自动处理存在困难，因为其中包含的许多术语具有歧义性。词义消歧（WSD）系统试图解决这些歧义并确定正确的含义。然而，关于生物医学文献WSD系统的已发表文献表明，不同术语的性能存在相当大的差异。WSD系统的开发在获取必要的训练数据方面通常成本高昂。因此，能够提前预测哪些术语WSD系统可能表现良好或不佳将是很有用的。本文探讨了各种方法，用于估计WSD系统在广泛的歧义生物医学术语（包括歧义单词/短语和缩写）上的性能。这些方法包括监督式和非监督式方法。监督式方法利用来自标记训练数据的信息，而非监督式方法则依赖于UMLS元词表。通过将它们对歧义术语消歧难度的预测与两个WSD系统的输出进行比较来评估这些方法。我们发现监督式方法是WSD难度的最佳预测器，但受到其对标记训练数据的依赖的限制。非监督式方法在某些情况下都表现良好，并且可以更广泛地应用。

相似文献

Determining the difficulty of Word Sense Disambiguation.

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

Knowledge-based biomedical word sense disambiguation: comparison of approaches.

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

Disambiguation in the biomedical domain: the role of ambiguity type.

J Biomed Inform. 2010 Dec;43(6):972-81. doi: 10.1016/j.jbi.2010.08.009. Epub 2010 Sep 9.

Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus.

J Biomed Inform. 2010 Oct;43(5):762-73. doi: 10.1016/j.jbi.2010.06.001. Epub 2010 Jun 10.

Collocation analysis for UMLS knowledge-based word sense disambiguation.

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-12-S3-S4.

Graph-based word sense disambiguation of biomedical documents.

Bioinformatics. 2010 Nov 15;26(22):2889-96. doi: 10.1093/bioinformatics/btq555. Epub 2010 Oct 7.

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation.

BMC Bioinformatics. 2011 Jun 2;12:223. doi: 10.1186/1471-2105-12-223.

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

Developing a test collection for biomedical word sense disambiguation.

Proc AMIA Symp. 2001:746-50.

Knowledge based word-concept model estimation and refinement for biomedical text mining.

J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.

引用本文的文献

A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry.

Graefes Arch Clin Exp Ophthalmol. 2023 Nov;261(11):3335-3344. doi: 10.1007/s00417-023-06190-2. Epub 2023 Aug 3.

Temporal disambiguation of relative temporal expressions in clinical texts.

Front Res Metr Anal. 2022 Oct 24;7:1001266. doi: 10.3389/frma.2022.1001266. eCollection 2022.

Adeft: Acromine-based Disambiguation of Entities from Text with applications to the biomedical literature.

J Open Source Softw. 2020;5(45). doi: 10.21105/joss.01708. Epub 2020 Jan 16.

Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification.

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):75. doi: 10.1186/s12911-019-0784-1.

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.

J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.

BMC Bioinformatics. 2018 Nov 6;19(1):405. doi: 10.1186/s12859-018-2429-2.

Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation.

J Integr Bioinform. 2017 Dec 13;14(4):/j/jib.2017.14.issue-4/jib-2017-0051/jib-2017-0051.xml. doi: 10.1515/jib-2017-0051.

The Implicitome: A Resource for Rationalizing Gene-Disease Associations.

PLoS One. 2016 Feb 26;11(2):e0149621. doi: 10.1371/journal.pone.0149621. eCollection 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

确定词义消歧的难度。

Determining the difficulty of Word Sense Disambiguation.

作者信息

McInnes Bridget T, Stevenson Mark

机构信息

Minnesota Supercomputing Institute, University of Minnesota, 117 Pleasant St SE, Minneapolis, MN 55455, USA.

Natural Language Processing Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom.

出版信息

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

DOI:10.1016/j.jbi.2013.09.009

PMID:24076369

Abstract

摘要

确定词义消歧的难度。

Determining the difficulty of Word Sense Disambiguation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

确定词义消歧的难度。

Determining the difficulty of Word Sense Disambiguation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献