词义消歧准确性对基于文献的发现的影响。

The effect of word sense disambiguation accuracy on literature based discovery.

作者信息

Preiss Judita, Stevenson Mark

机构信息

Advanced Computing Research Center, Department of Computer Science, The University of Sheffield, 211 Portobello, Sheffield, S1 4DP, UK.

出版信息

BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):57. doi: 10.1186/s12911-016-0296-1.

DOI:10.1186/s12911-016-0296-1

PMID:27455071

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4959388/

Abstract

BACKGROUND

The volume of research published in the biomedical domain has increasingly lead to researchers focussing on specific areas of interest and connections between findings being missed. Literature based discovery (LBD) attempts to address this problem by searching for previously unnoticed connections between published information (also known as "hidden knowledge"). A common approach is to identify hidden knowledge via shared linking terms. However, biomedical documents are highly ambiguous which can lead LBD systems to over generate hidden knowledge by hypothesising connections through different meanings of linking terms. Word Sense Disambiguation (WSD) aims to resolve ambiguities in text by identifying the meaning of ambiguous terms. This study explores the effect of WSD accuracy on LBD performance.

METHODS

An existing LBD system is employed and four approaches to WSD of biomedical documents integrated with it. The accuracy of each WSD approach is determined by comparing its output against a standard benchmark. Evaluation of the LBD output is carried out using timeslicing approach, where hidden knowledge is generated from articles published prior to a certain cutoff date and a gold standard extracted from publications after the cutoff date.

RESULTS

WSD accuracy varies depending on the approach used. The connection between the performance of the LBD and WSD systems are analysed to reveal a correlation between WSD accuracy and LBD performance.

CONCLUSION

This study reveals that LBD performance is sensitive to WSD accuracy. It is therefore concluded that WSD has the potential to improve the output of LBD systems by reducing the amount of spurious hidden knowledge that is generated. It is also suggested that further improvements in WSD accuracy have the potential to improve LBD accuracy.

摘要

背景

生物医学领域发表的研究数量日益增加，这使得研究人员越来越专注于特定的感兴趣领域，从而忽略了研究结果之间的联系。基于文献的发现（LBD）试图通过搜索已发表信息之间以前未被注意到的联系（也称为“隐藏知识”）来解决这个问题。一种常见的方法是通过共享链接词来识别隐藏知识。然而，生物医学文档具有高度的歧义性，这可能导致LBD系统通过对链接词的不同含义进行假设来过度生成隐藏知识。词义消歧（WSD）旨在通过识别歧义词的含义来解决文本中的歧义。本研究探讨了WSD准确性对LBD性能的影响。

方法

采用现有的LBD系统，并将四种生物医学文档WSD方法与之集成。每种WSD方法的准确性通过将其输出与标准基准进行比较来确定。使用时间切片方法对LBD输出进行评估，其中隐藏知识是从某个截止日期之前发表的文章中生成的，而黄金标准是从截止日期之后的出版物中提取的。

结果

WSD准确性因所使用的方法而异。分析了LBD和WSD系统性能之间的联系，以揭示WSD准确性与LBD性能之间的相关性。

结论

本研究表明LBD性能对WSD准确性敏感。因此得出结论，WSD有潜力通过减少生成的虚假隐藏知识的数量来提高LBD系统的输出。还建议进一步提高WSD准确性有可能提高LBD准确性。

相似文献

The effect of word sense disambiguation accuracy on literature based discovery.词义消歧准确性对基于文献的发现的影响。

BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):57. doi: 10.1186/s12911-016-0296-1.

Determining the difficulty of Word Sense Disambiguation.确定词义消歧的难度。

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.基于知识的生物医学词义消歧：评估及在临床文档分类中的应用。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):882-6. doi: 10.1136/amiajnl-2012-001350. Epub 2012 Oct 16.

Collocation analysis for UMLS knowledge-based word sense disambiguation.基于 UMLS 的词汇搭配分析在词义消歧中的应用。

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-12-S3-S4.

Context-driven automatic subgraph creation for literature-based discovery.用于基于文献的发现的上下文驱动自动子图创建

J Biomed Inform. 2015 Apr;54:141-57. doi: 10.1016/j.jbi.2015.01.014. Epub 2015 Feb 7.

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧：生物医学文本数据的有效深度神经网络词汇语义消歧。

J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts.研究不同词义消歧方法与生物医学文本摘要有效性之间的相关性。

BMC Bioinformatics. 2011 Aug 26;12:355. doi: 10.1186/1471-2105-12-355.

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.生物医学领域中的机器学习与词义消歧：设计与评估问题

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation.应用于生物医学词义消歧的监督学习和基于知识的方法。

J Integr Bioinform. 2017 Dec 13;14(4):/j/jib.2017.14.issue-4/jib-2017-0051/jib-2017-0051.xml. doi: 10.1515/jib-2017-0051.

Knowledge-based biomedical word sense disambiguation: comparison of approaches.基于知识的生物医学词义消歧：方法比较。

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

引用本文的文献

Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries.结合文献挖掘和机器学习预测生物医学发现。

Methods Mol Biol. 2022;2496:123-140. doi: 10.1007/978-1-0716-2305-3_7.

A systematic review on literature-based discovery workflow.基于文献的发现工作流程的系统综述。

PeerJ Comput Sci. 2019 Nov 18;5:e235. doi: 10.7717/peerj-cs.235. eCollection 2019.

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧：生物医学文本数据的有效深度神经网络词汇语义消歧。

J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.

Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery.重新发现唐·斯旺森：基于文献的发现的过去、现在与未来

J Data Inf Sci. 2017 Dec;2(4):43-64. doi: 10.1515/jdis-2017-0019.

本文引用的文献

Exploring relation types for literature-based discovery.探索基于文献的发现中的关系类型。

J Am Med Inform Assoc. 2015 Sep;22(5):987-92. doi: 10.1093/jamia/ocv002. Epub 2015 May 13.

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation.利用 MEDLINE 中的 MeSH 索引生成用于词义消歧的数据集合。

BMC Bioinformatics. 2011 Jun 2;12:223. doi: 10.1186/1471-2105-12-223.

Graph-based word sense disambiguation of biomedical documents.基于图的生物医学文献词义消歧。

Bioinformatics. 2010 Nov 15;26(22):2889-96. doi: 10.1093/bioinformatics/btq555. Epub 2010 Oct 7.

Disambiguation in the biomedical domain: the role of ambiguity type.生物医学领域的消歧：歧义类型的作用。

J Biomed Inform. 2010 Dec;43(6):972-81. doi: 10.1016/j.jbi.2010.08.009. Epub 2010 Sep 9.

An overview of MetaMap: historical perspective and recent advances.MetaMap 概述：历史视角与最新进展。

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

A new evaluation methodology for literature-based discovery systems.一种基于文献的发现系统的新评估方法。

J Biomed Inform. 2009 Aug;42(4):633-43. doi: 10.1016/j.jbi.2008.12.001. Epub 2008 Dec 16.

Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection.LocusLink和MEDLINE中人类基因符号的歧义性：创建清单和消歧测试集。

AMIA Annu Symp Proc. 2003;2003:704-8.

The Unified Medical Language System (UMLS): integrating biomedical terminology.统一医学语言系统（UMLS）：整合生物医学术语。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.

A study of abbreviations in MEDLINE abstracts.一项关于医学在线数据库（MEDLINE）摘要中缩写词的研究。

Proc AMIA Symp. 2002:464-8.

Developing a test collection for biomedical word sense disambiguation.开发用于生物医学词义消歧的测试集。

Proc AMIA Symp. 2001:746-50.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验