基于知识的生物医学词义消歧：方法比较。

Knowledge-based biomedical word sense disambiguation: comparison of approaches.

机构信息

National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

DOI:10.1186/1471-2105-11-569

PMID:21092226

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3001745/

Abstract

BACKGROUND

Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain.

METHODS

We present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM.

CONCLUSIONS

We have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.

摘要

背景

词义消歧（WSD）算法试图在文本中选择歧义术语的正确含义。UMLS 等资源提供了一个参考词库，用于注释生物医学文献。统计学习方法已经取得了很好的结果，但 UMLS 的规模使得制作训练数据来涵盖所有领域变得不可行。

方法

我们介绍了基于知识库的现有 WSD 方法的研究，这些方法补充了基于统计学习的研究。我们比较了四种方法，这些方法都依赖于 UMLS Metathesaurus 作为知识来源。第一种方法比较了歧义词的上下文与基于定义、同义词和相关术语构建的表示中的候选含义的重叠。第二种方法为每个候选含义收集训练数据，根据使用单义词和相关术语构建的查询执行 WSD。这些查询用于检索 MEDLINE 引文。然后，在这个语料库上训练机器学习方法。第三种方法是一种基于图的方法，利用 Metathesaurus 关系网络的结构来执行无监督的 WSD。该方法根据节点在图中的相对结构重要性对节点进行排序。最后一种方法使用分配给 Metathesaurus 中概念的语义类型来执行 WSD。将歧义词的上下文和候选概念的语义类型映射到 Journal Descriptors。通过比较这些映射来在候选概念中做出选择。结果是在 NLM 提供的 WSD 测试集中估计不同方法的准确性。

结论

与其他方法相比，我们发现最后一种方法的效果更好。基于 Metathesaurus 网络结构来估计 Metathesaurus 概念相关性的基于图的方法与前两种方法相比效果不佳。此外，方法的组合提高了性能优于个别方法。另一方面，性能仍然低于基于人工生成数据训练的统计学习，也低于最大频率感知基线。最后，我们提出了几种改进现有方法和改进 Metathesaurus 以提高 WSD 效果的方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93a2/3001745/ca5959745eb8/1471-2105-11-569-1.jpg

相似文献

Knowledge-based biomedical word sense disambiguation: comparison of approaches.基于知识的生物医学词义消歧：方法比较。

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

Collocation analysis for UMLS knowledge-based word sense disambiguation.基于 UMLS 的词汇搭配分析在词义消歧中的应用。

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-12-S3-S4.

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation.利用 MEDLINE 中的 MeSH 索引生成用于词义消歧的数据集合。

BMC Bioinformatics. 2011 Jun 2;12:223. doi: 10.1186/1471-2105-12-223.

Determining the difficulty of Word Sense Disambiguation.确定词义消歧的难度。

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.基于知识的生物医学词义消歧：评估及在临床文档分类中的应用。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):882-6. doi: 10.1136/amiajnl-2012-001350. Epub 2012 Oct 16.

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts.研究不同词义消歧方法与生物医学文本摘要有效性之间的相关性。

BMC Bioinformatics. 2011 Aug 26;12:355. doi: 10.1186/1471-2105-12-355.

Graph-based word sense disambiguation of biomedical documents.基于图的生物医学文献词义消歧。

Bioinformatics. 2010 Nov 15;26(22):2889-96. doi: 10.1093/bioinformatics/btq555. Epub 2010 Oct 7.

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.生物医学领域中的机器学习与词义消歧：设计与评估问题

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

Word sense disambiguation via semantic type classification.通过语义类型分类进行词义消歧。

AMIA Annu Symp Proc. 2008 Nov 6;2008:177-81.

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation.基于长短期记忆节点的词嵌入和循环神经网络在有监督生物医学词义消歧中的应用

J Biomed Inform. 2017 Sep;73:137-147. doi: 10.1016/j.jbi.2017.08.001. Epub 2017 Aug 7.

引用本文的文献

Clinical Note Structural Knowledge Improves Word Sense Disambiguation.临床笔记结构知识可改善词义消歧。

AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:515-524. eCollection 2024.

Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health.对研究不足的医学概念领域进行自动编码：将身体活动报告与《国际功能、残疾和健康分类》相联系。

Front Digit Health. 2021 Mar;3. doi: 10.3389/fdgth.2021.620828. Epub 2021 Mar 10.

Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks.基于双向长短期记忆和注意力机制的神经网络的生物医学词义消歧。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):502. doi: 10.1186/s12859-019-3079-8.

A semantic-based workflow for biomedical literature annotation.基于语义的生物医学文献标注工作流。

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax088.

Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods.临床领域的词义消歧：知识丰富和知识贫乏的无监督方法比较。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):842-9. doi: 10.1136/amiajnl-2013-002133. Epub 2014 Jan 17.

Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text.评估语义相似性和关联性的度量标准，以消除生物医学文本中的术语歧义。

J Biomed Inform. 2013 Dec;46(6):1116-24. doi: 10.1016/j.jbi.2013.08.008. Epub 2013 Sep 4.

MeSH indexing based on automatically generated summaries.基于自动生成的摘要进行 MeSH 标引。

BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations.结合源自语料库的词义概况与估计的频率信息来消除临床缩写的歧义。

AMIA Annu Symp Proc. 2012;2012:1004-13. Epub 2012 Nov 3.

A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.当前临床自然语言处理系统在处理出院小结中缩写词方面的比较研究。

AMIA Annu Symp Proc. 2012;2012:997-1003. Epub 2012 Nov 3.

A learning-based approach for biomedical word sense disambiguation.一种基于学习的生物医学词义消歧方法。

ScientificWorldJournal. 2012;2012:949247. doi: 10.1100/2012/949247. Epub 2012 May 1.

本文引用的文献

An overview of MetaMap: historical perspective and recent advances.MetaMap 概述：历史视角与最新进展。

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text.UMLS 内容视图适合于生物医学文献与临床文本的自然语言处理。

J Biomed Inform. 2010 Aug;43(4):587-94. doi: 10.1016/j.jbi.2010.02.005. Epub 2010 Feb 10.

Word Sense Disambiguation by Selecting the Best Semantic Type Based on Journal Descriptor Indexing: Preliminary Experiment.基于期刊描述符索引选择最佳语义类型的词义消歧：初步实验

J Am Soc Inf Sci Technol. 2006 Jan 1;57(1):96-113. doi: 10.1002/asi.20257.

Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy.利用本体和元数据进行生物医学词义消歧：自动化与准确性的结合。

BMC Bioinformatics. 2009 Jan 21;10:28. doi: 10.1186/1471-2105-10-28.

Methodology for creating UMLS content views appropriate for biomedical natural language processing.创建适用于生物医学自然语言处理的统一医学语言系统（UMLS）内容视图的方法。

AMIA Annu Symp Proc. 2008 Nov 6;2008:21-5.

Text processing through Web services: calling Whatizit.通过网络服务进行文本处理：调用Whatizit。

Bioinformatics. 2008 Jan 15;24(2):296-8. doi: 10.1093/bioinformatics/btm557. Epub 2007 Nov 15.

Resolving abbreviations to their senses in Medline.在医学文献数据库（Medline）中解析缩写词的含义。

Bioinformatics. 2005 Sep 15;21(18):3658-64. doi: 10.1093/bioinformatics/bti586. Epub 2005 Jul 21.

Word sense disambiguation in the biomedical domain: an overview.生物医学领域的词义消歧：综述

J Comput Biol. 2005 Jun;12(5):554-65. doi: 10.1089/cmb.2005.12.554.

Effects of information and machine learning algorithms on word sense disambiguation with small datasets.信息和机器学习算法对小数据集词义消歧的影响。

Int J Med Inform. 2005 Aug;74(7-8):573-85. doi: 10.1016/j.ijmedinf.2005.03.013.

The Unified Medical Language System (UMLS): integrating biomedical terminology.统一医学语言系统（UMLS）：整合生物医学术语。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于知识的生物医学词义消歧：方法比较。

Knowledge-based biomedical word sense disambiguation: comparison of approaches.

机构信息

出版信息

BACKGROUND

METHODS

CONCLUSIONS

背景

方法

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献