一种用于生物医学实体感知归纳的新框架。

A novel framework for biomedical entity sense induction.

机构信息

College of Medicine, University of Florida, USA.

University of Montpellier, LIRMM, CNRS, Montpellier, France.

出版信息

J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.

DOI:10.1016/j.jbi.2018.06.007

PMID:29935347

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6766751/

Abstract

BACKGROUND

Rapid advancements in biomedical research have accelerated the number of relevant electronic documents published online, ranging from scholarly articles to news, blogs, and user-generated social media content. Nevertheless, the vast amount of this information is poorly organized, making it difficult to navigate. Emerging technologies such as ontologies and knowledge bases (KBs) could help organize and track the information associated with biomedical research developments. A major challenge in the automatic construction of ontologies and KBs is the identification of words with its respective sense(s) from a free-text corpus. Word-sense induction (WSI) is a task to automatically induce the different senses of a target word in the different contexts. In the last two decades, there have been several efforts on WSI. However, few methods are effective in biomedicine and life sciences.

METHODS

We developed a framework for biomedical entity sense induction using a mixture of natural language processing, supervised, and unsupervised learning methods with promising results. It is composed of three main steps: (1) a polysemy detection method to determine if a biomedical entity has many possible meanings; (2) a clustering quality index-based approach to predict the number of senses for the biomedical entity; and (3) a method to induce the concept(s) (i.e., senses) of the biomedical entity in a given context.

RESULTS

To evaluate our framework, we used the well-known MSH WSD polysemic dataset that contains 203 annotated ambiguous biomedical entities, where each entity is linked to 2-5 concepts. Our polysemy detection method obtained an F-measure of 98%. Second, our approach for predicting the number of senses achieved an F-measure of 93%. Finally, we induced the concepts of the biomedical entities based on a clustering algorithm and then extracted the keywords of reach cluster to represent the concept.

CONCLUSIONS

We have developed a framework for biomedical entity sense induction with promising results. Our study results can benefit a number of downstream applications, for example, help to resolve concept ambiguities when building Semantic Web KBs from biomedical text.

摘要

背景

生物医学研究的快速发展加速了在线发表的相关电子文档数量的增长，这些文档的范围从学术文章到新闻、博客和用户生成的社交媒体内容。然而，大量的信息组织得很差，难以浏览。本体和知识库（KB）等新兴技术可以帮助组织和跟踪与生物医学研究发展相关的信息。本体和 KB 的自动构建中的一个主要挑战是从自由文本语料库中识别具有相应意义的单词。词义推断（WSI）是一项从不同上下文中自动推断目标单词的不同意义的任务。在过去的二十年中，已经有一些关于 WSI 的努力。然而，很少有方法在生物医学和生命科学中是有效的。

方法

我们开发了一种使用自然语言处理、监督和无监督学习方法混合的生物医学实体意义感应框架，取得了有希望的结果。它由三个主要步骤组成：（1）多义性检测方法，用于确定生物医学实体是否有多种可能的含义；（2）基于聚类质量指数的方法，用于预测生物医学实体的意义数量；（3）一种在给定上下文中诱导生物医学实体的概念（即意义）的方法。

结果

为了评估我们的框架，我们使用了著名的 MSH WSD 多义词数据集，其中包含 203 个注释的模糊生物医学实体，每个实体与 2-5 个概念相关联。我们的多义性检测方法获得了 98%的 F 度量。其次，我们的意义数量预测方法达到了 93%的 F 度量。最后，我们基于聚类算法诱导生物医学实体的概念，然后提取每个聚类的关键词来表示该概念。

结论

我们开发了一种具有前景的生物医学实体意义感应框架。我们的研究结果可以使许多下游应用受益，例如，在从生物医学文本构建语义 Web KB 时帮助解决概念模糊性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2041/6766751/dd09afd655c4/nihms-1051303-f0001.jpg

相似文献

A novel framework for biomedical entity sense induction.一种用于生物医学实体感知归纳的新框架。

J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.

Determining the difficulty of Word Sense Disambiguation.确定词义消歧的难度。

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.生物医学领域中的机器学习与词义消歧：设计与评估问题

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

A knowledge-driven approach to biomedical document conceptualization.基于知识的生物医学文献概念化方法。

Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3.

Collocation analysis for UMLS knowledge-based word sense disambiguation.基于 UMLS 的词汇搭配分析在词义消歧中的应用。

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-12-S3-S4.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧：生物医学文本数据的有效深度神经网络词汇语义消歧。

J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.

Knowledge based word-concept model estimation and refinement for biomedical text mining.用于生物医学文本挖掘的基于知识的词概念模型估计与优化。

J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.

Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations.利用生物医学知识图谱中的语义模式预测治疗和因果关系。

J Biomed Inform. 2018 Jun;82:189-199. doi: 10.1016/j.jbi.2018.05.003. Epub 2018 May 12.

Knowledge-based biomedical word sense disambiguation: comparison of approaches.基于知识的生物医学词义消歧：方法比较。

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

引用本文的文献

Clinical concept recognition: Evaluation of existing systems on EHRs.临床概念识别：对电子健康记录现有系统的评估。

Front Artif Intell. 2023 Jan 13;5:1051724. doi: 10.3389/frai.2022.1051724. eCollection 2022.

Clustering and topic modeling over tweets: A comparison over a health dataset.推特上的聚类与主题建模：基于健康数据集的比较

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2019 Nov;2019:1544-1547. doi: 10.1109/bibm47256.2019.8983167. Epub 2020 Feb 6.

Evaluation of clustering and topic modeling methods over health-related tweets and emails.健康相关推文和电子邮件的聚类和主题建模方法评估。

Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7.

本文引用的文献

Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings.基于知识的生物医学词汇语义消歧与神经概念嵌入

Proc IEEE Int Symp Bioinformatics Bioeng. 2017 Oct;2017:163-170. doi: 10.1109/BIBE.2017.00-61. Epub 2018 Jan 11.

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation.基于长短期记忆节点的词嵌入和循环神经网络在有监督生物医学词义消歧中的应用

J Biomed Inform. 2017 Sep;73:137-147. doi: 10.1016/j.jbi.2017.08.001. Epub 2017 Aug 7.

Clinical Word Sense Disambiguation with Interactive Search and Classification.基于交互式搜索与分类的临床词义消歧

AMIA Annu Symp Proc. 2017 Feb 10;2016:2062-2071. eCollection 2016.

Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。

Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.

Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods.临床领域的词义消歧：知识丰富和知识贫乏的无监督方法比较。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):842-9. doi: 10.1136/amiajnl-2013-002133. Epub 2014 Jan 17.

Nursing documentation: frameworks and barriers.护理文件记录：框架与障碍

Contemp Nurse. 2012 Jun;41(2):160-8. doi: 10.5172/conu.2012.41.2.160.

A learning-based approach for biomedical word sense disambiguation.一种基于学习的生物医学词义消歧方法。

ScientificWorldJournal. 2012;2012:949247. doi: 10.1100/2012/949247. Epub 2012 May 1.

A cluster separation measure.一种聚类分离度量。

IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7.

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation.利用 MEDLINE 中的 MeSH 索引生成用于词义消歧的数据集合。

BMC Bioinformatics. 2011 Jun 2;12:223. doi: 10.1186/1471-2105-12-223.

Disambiguation in the biomedical domain: the role of ambiguity type.生物医学领域的消歧：歧义类型的作用。

J Biomed Inform. 2010 Dec;43(6):972-81. doi: 10.1016/j.jbi.2010.08.009. Epub 2010 Sep 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验