文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

从零到英雄:利用变压器在零样本和少样本上下文中进行生物医学命名实体识别。

From zero to hero: Harnessing transformers for biomedical named entity recognition in zero- and few-shot contexts.

机构信息

Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, Novi Sad, 21000, Serbia.

Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, Novi Sad, 21000, Serbia; Bayer A.G., Research and Development, Mullerstrasse 173, Berlin, 13342, Germany.

出版信息

Artif Intell Med. 2024 Oct;156:102970. doi: 10.1016/j.artmed.2024.102970. Epub 2024 Aug 24.


DOI:10.1016/j.artmed.2024.102970
PMID:39197375
Abstract

Supervised named entity recognition (NER) in the biomedical domain depends on large sets of annotated texts with the given named entities. The creation of such datasets can be time-consuming and expensive, while extraction of new entities requires additional annotation tasks and retraining the model. This paper proposes a method for zero- and few-shot NER in the biomedical domain to address these challenges. The method is based on transforming the task of multi-class token classification into binary token classification and pre-training on a large number of datasets and biomedical entities, which allows the model to learn semantic relations between the given and potentially novel named entity labels. We have achieved average F1 scores of 35.44% for zero-shot NER, 50.10% for one-shot NER, 69.94% for 10-shot NER, and 79.51% for 100-shot NER on 9 diverse evaluated biomedical entities with fine-tuned PubMedBERT-based model. The results demonstrate the effectiveness of the proposed method for recognizing new biomedical entities with no or limited number of examples, outperforming previous transformer-based methods, and being comparable to GPT3-based models using models with over 1000 times fewer parameters. We make models and developed code publicly available.

摘要

在生物医学领域,监督命名实体识别(NER)依赖于具有给定命名实体的大型标注文本集。创建这样的数据集可能既耗时又昂贵,而提取新实体则需要额外的标注任务和重新训练模型。本文提出了一种在生物医学领域进行零样本和少样本 NER 的方法,以解决这些挑战。该方法基于将多类别标记分类任务转换为二类别标记分类,并在大量数据集和生物医学实体上进行预训练,这使得模型能够学习给定和潜在新命名实体标签之间的语义关系。我们在 9 个不同评估的生物医学实体上,使用微调后的基于 PubMedBERT 的模型,实现了零样本 NER 的平均 F1 得分为 35.44%,一 样本 NER 的平均 F1 得分为 50.10%,10 样本 NER 的平均 F1 得分为 69.94%,100 样本 NER 的平均 F1 得分为 79.51%。结果表明,该方法在识别具有少量或没有示例的新生物医学实体方面非常有效,优于之前基于转换器的方法,并且与使用 1000 多倍参数较少的模型的 GPT3 模型相当。我们公开了模型和开发的代码。

相似文献

[1]
From zero to hero: Harnessing transformers for biomedical named entity recognition in zero- and few-shot contexts.

Artif Intell Med. 2024-10

[2]
Improving biomedical Named Entity Recognition with additional external contexts.

J Biomed Inform. 2024-8

[3]
Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study.

JMIR Med Inform. 2024-10-17

[4]
Advancing entity recognition in biomedicine via instruction tuning of large language models.

Bioinformatics. 2024-3-29

[5]
Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.

BMC Med Inform Decis Mak. 2021-7-30

[6]
Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition.

J Proteome Res. 2024-6-7

[7]
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.

J Med Internet Res. 2021-8-9

[8]
A comparison of few-shot and traditional named entity recognition models for medical text.

Proc (IEEE Int Conf Healthc Inform). 2022-6

[9]
Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.

JMIR AI. 2024-5-16

[10]
Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes.

Database (Oxford). 2024-7-30

引用本文的文献

[1]
GRU-SCANET: unleashing the power of GRU-based sinusoidal capture network for precision-driven named entity recognition.

Bioinform Adv. 2025-6-16

[2]
[Transformation of free-text radiology reports into structured data].

Radiologie (Heidelb). 2025-4

[3]
Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.

BMC Bioinformatics. 2025-1-30

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索