Suppr超能文献

基于演示的机器阅读理解下的少样本生物医学命名实体识别学习。

Demonstration-based learning for few-shot biomedical named entity recognition under machine reading comprehension.

机构信息

Department of Mathematics, Hainan University, Haikou 570228, China.

Department of Data Science and Big Data Technology, Hainan University, Haikou 570228, China.

出版信息

J Biomed Inform. 2024 Nov;159:104739. doi: 10.1016/j.jbi.2024.104739. Epub 2024 Oct 25.

Abstract

OBJECTIVE

Although deep learning techniques have shown significant achievements, they frequently depend on extensive amounts of hand-labeled data and tend to perform inadequately in few-shot scenarios. The objective of this study is to devise a strategy that can improve the model's capability to recognize biomedical entities in scenarios of few-shot learning.

METHODS

By redefining biomedical named entity recognition (BioNER) as a machine reading comprehension (MRC) problem, we propose a demonstration-based learning method to address few-shot BioNER, which involves constructing appropriate task demonstrations. In assessing our proposed method, we compared the proposed method with existing advanced methods using six benchmark datasets, including BC4CHEMD, BC5CDR-Chemical, BC5CDR-Disease, NCBI-Disease, BC2GM, and JNLPBA.

RESULTS

We examined the models' efficacy by reporting F1 scores from both the 25-shot and 50-shot learning experiments. In 25-shot learning, we observed 1.1% improvements in the average F1 scores compared to the baseline method, reaching 61.7%, 84.1%, 69.1%, 70.1%, 50.6%, and 59.9% on six datasets, respectively. In 50-shot learning, we further improved the average F1 scores by 1.0% compared to the baseline method, reaching 73.1%, 86.8%, 76.1%, 75.6%, 61.7%, and 65.4%, respectively.

CONCLUSION

We reported that in the realm of few-shot learning BioNER, MRC-based language models are much more proficient in recognizing biomedical entities compared to the sequence labeling approach. Furthermore, our MRC-language models can compete successfully with fully-supervised learning methodologies that rely heavily on the availability of abundant annotated data. These results highlight possible pathways for future advancements in few-shot BioNER methodologies.

摘要

目的

尽管深度学习技术已经取得了显著的成就,但它们通常依赖于大量的人工标注数据,并且在少数样本情况下表现不佳。本研究的目的是设计一种策略,可以提高模型在少数样本学习情况下识别生物医学实体的能力。

方法

通过将生物医学命名实体识别(BioNER)重新定义为机器阅读理解(MRC)问题,我们提出了一种基于演示的学习方法来解决少数样本 BioNER,包括构建适当的任务演示。在评估我们提出的方法时,我们使用六个基准数据集,包括 BC4CHEMD、BC5CDR-Chemical、BC5CDR-Disease、NCBI-Disease、BC2GM 和 JNLPBA,将提出的方法与现有的先进方法进行了比较。

结果

我们通过报告 25 次和 50 次学习实验的 F1 分数来检查模型的效果。在 25 次学习中,与基线方法相比,我们观察到平均 F1 分数提高了 1.1%,分别达到 61.7%、84.1%、69.1%、70.1%、50.6%和 59.9%,在六个数据集上。在 50 次学习中,与基线方法相比,我们进一步将平均 F1 分数提高了 1.0%,分别达到 73.1%、86.8%、76.1%、75.6%、61.7%和 65.4%。

结论

我们报告说,在少数样本学习 BioNER 中,基于 MRC 的语言模型在识别生物医学实体方面比基于序列标记的方法更有效。此外,我们的 MRC 语言模型可以与严重依赖大量标注数据的完全监督学习方法相媲美。这些结果为少数样本 BioNER 方法的未来发展提供了可能的途径。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验