Suppr超能文献

一种用于生物医学实体感知归纳的新框架。

A novel framework for biomedical entity sense induction.

机构信息

College of Medicine, University of Florida, USA.

University of Montpellier, LIRMM, CNRS, Montpellier, France.

出版信息

J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.

Abstract

BACKGROUND

Rapid advancements in biomedical research have accelerated the number of relevant electronic documents published online, ranging from scholarly articles to news, blogs, and user-generated social media content. Nevertheless, the vast amount of this information is poorly organized, making it difficult to navigate. Emerging technologies such as ontologies and knowledge bases (KBs) could help organize and track the information associated with biomedical research developments. A major challenge in the automatic construction of ontologies and KBs is the identification of words with its respective sense(s) from a free-text corpus. Word-sense induction (WSI) is a task to automatically induce the different senses of a target word in the different contexts. In the last two decades, there have been several efforts on WSI. However, few methods are effective in biomedicine and life sciences.

METHODS

We developed a framework for biomedical entity sense induction using a mixture of natural language processing, supervised, and unsupervised learning methods with promising results. It is composed of three main steps: (1) a polysemy detection method to determine if a biomedical entity has many possible meanings; (2) a clustering quality index-based approach to predict the number of senses for the biomedical entity; and (3) a method to induce the concept(s) (i.e., senses) of the biomedical entity in a given context.

RESULTS

To evaluate our framework, we used the well-known MSH WSD polysemic dataset that contains 203 annotated ambiguous biomedical entities, where each entity is linked to 2-5 concepts. Our polysemy detection method obtained an F-measure of 98%. Second, our approach for predicting the number of senses achieved an F-measure of 93%. Finally, we induced the concepts of the biomedical entities based on a clustering algorithm and then extracted the keywords of reach cluster to represent the concept.

CONCLUSIONS

We have developed a framework for biomedical entity sense induction with promising results. Our study results can benefit a number of downstream applications, for example, help to resolve concept ambiguities when building Semantic Web KBs from biomedical text.

摘要

背景

生物医学研究的快速发展加速了在线发表的相关电子文档数量的增长,这些文档的范围从学术文章到新闻、博客和用户生成的社交媒体内容。然而,大量的信息组织得很差,难以浏览。本体和知识库(KB)等新兴技术可以帮助组织和跟踪与生物医学研究发展相关的信息。本体和 KB 的自动构建中的一个主要挑战是从自由文本语料库中识别具有相应意义的单词。词义推断(WSI)是一项从不同上下文中自动推断目标单词的不同意义的任务。在过去的二十年中,已经有一些关于 WSI 的努力。然而,很少有方法在生物医学和生命科学中是有效的。

方法

我们开发了一种使用自然语言处理、监督和无监督学习方法混合的生物医学实体意义感应框架,取得了有希望的结果。它由三个主要步骤组成:(1)多义性检测方法,用于确定生物医学实体是否有多种可能的含义;(2)基于聚类质量指数的方法,用于预测生物医学实体的意义数量;(3)一种在给定上下文中诱导生物医学实体的概念(即意义)的方法。

结果

为了评估我们的框架,我们使用了著名的 MSH WSD 多义词数据集,其中包含 203 个注释的模糊生物医学实体,每个实体与 2-5 个概念相关联。我们的多义性检测方法获得了 98%的 F 度量。其次,我们的意义数量预测方法达到了 93%的 F 度量。最后,我们基于聚类算法诱导生物医学实体的概念,然后提取每个聚类的关键词来表示该概念。

结论

我们开发了一种具有前景的生物医学实体意义感应框架。我们的研究结果可以使许多下游应用受益,例如,在从生物医学文本构建语义 Web KB 时帮助解决概念模糊性。

相似文献

1
A novel framework for biomedical entity sense induction.一种用于生物医学实体感知归纳的新框架。
J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.
2
Determining the difficulty of Word Sense Disambiguation.确定词义消歧的难度。
J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.
4
A knowledge-driven approach to biomedical document conceptualization.基于知识的生物医学文献概念化方法。
Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3.

引用本文的文献

1
Clinical concept recognition: Evaluation of existing systems on EHRs.临床概念识别:对电子健康记录现有系统的评估。
Front Artif Intell. 2023 Jan 13;5:1051724. doi: 10.3389/frai.2022.1051724. eCollection 2022.
2
Clustering and topic modeling over tweets: A comparison over a health dataset.推特上的聚类与主题建模:基于健康数据集的比较
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2019 Nov;2019:1544-1547. doi: 10.1109/bibm47256.2019.8983167. Epub 2020 Feb 6.

本文引用的文献

1
Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings.基于知识的生物医学词汇语义消歧与神经概念嵌入
Proc IEEE Int Symp Bioinformatics Bioeng. 2017 Oct;2017:163-170. doi: 10.1109/BIBE.2017.00-61. Epub 2018 Jan 11.
4
Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。
Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.
6
Nursing documentation: frameworks and barriers.护理文件记录:框架与障碍
Contemp Nurse. 2012 Jun;41(2):160-8. doi: 10.5172/conu.2012.41.2.160.
7
A learning-based approach for biomedical word sense disambiguation.一种基于学习的生物医学词义消歧方法。
ScientificWorldJournal. 2012;2012:949247. doi: 10.1100/2012/949247. Epub 2012 May 1.
8
A cluster separation measure.一种聚类分离度量。
IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7.
10
Disambiguation in the biomedical domain: the role of ambiguity type.生物医学领域的消歧:歧义类型的作用。
J Biomed Inform. 2010 Dec;43(6):972-81. doi: 10.1016/j.jbi.2010.08.009. Epub 2010 Sep 9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验