• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过知识引导实例生成和提示对比学习的少样本生物医学命名实体识别。

Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning.

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.

School of Computer Science and Engineering, Dalian Minzu University, Dalian 116600, China.

出版信息

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad496.

DOI:10.1093/bioinformatics/btad496
PMID:37549065
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10444965/
Abstract

MOTIVATION

Few-shot learning that can effectively perform named entity recognition in low-resource scenarios has raised growing attention, but it has not been widely studied yet in the biomedical field. In contrast to high-resource domains, biomedical named entity recognition (BioNER) often encounters limited human-labeled data in real-world scenarios, leading to poor generalization performance when training only a few labeled instances. Recent approaches either leverage cross-domain high-resource data or fine-tune the pre-trained masked language model using limited labeled samples to generate new synthetic data, which is easily stuck in domain shift problems or yields low-quality synthetic data. Therefore, in this article, we study a more realistic scenario, i.e. few-shot learning for BioNER.

RESULTS

Leveraging the domain knowledge graph, we propose knowledge-guided instance generation for few-shot BioNER, which generates diverse and novel entities based on similar semantic relations of neighbor nodes. In addition, by introducing question prompt, we cast BioNER as question-answering task and propose prompt contrastive learning to improve the robustness of the model by measuring the mutual information between query-answer pairs. Extensive experiments conducted on various few-shot settings show that the proposed framework achieves superior performance. Particularly, in a low-resource scenario with only 20 samples, our approach substantially outperforms recent state-of-the-art models on four benchmark datasets, achieving an average improvement of up to 7.1% F1.

AVAILABILITY AND IMPLEMENTATION

Our source code and data are available at https://github.com/cpmss521/KGPC.

摘要

动机

在资源有限的情况下,能够有效地进行命名实体识别的少样本学习引起了越来越多的关注,但在生物医学领域尚未得到广泛研究。与高资源领域相比,生物医学命名实体识别(BioNER)在实际场景中经常遇到有限的人工标记数据,仅使用少量标记实例进行训练时,泛化性能较差。最近的方法要么利用跨领域的高资源数据,要么使用有限的标记样本微调预先训练的掩蔽语言模型来生成新的合成数据,这很容易陷入领域转移问题或产生低质量的合成数据。因此,在本文中,我们研究了一个更现实的场景,即生物医学命名实体识别的少样本学习。

结果

利用领域知识图谱,我们提出了基于知识的实例生成方法,用于少样本生物医学命名实体识别,该方法基于邻居节点的相似语义关系生成多样化和新颖的实体。此外,通过引入问题提示,我们将生物医学命名实体识别转化为问答任务,并提出提示对比学习,通过测量查询-答案对之间的互信息来提高模型的鲁棒性。在各种少样本设置下进行的广泛实验表明,所提出的框架实现了卓越的性能。特别是在仅有 20 个样本的低资源场景下,我们的方法在四个基准数据集上显著优于最新的最先进模型,平均 F1 提高了 7.1%。

可用性和实现

我们的源代码和数据可在 https://github.com/cpmss521/KGPC 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/95081c4d451b/btad496f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/8fe82d5cf3bd/btad496f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/25657560619e/btad496f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/715f1288c2f7/btad496f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/4651052c5d3a/btad496f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/95081c4d451b/btad496f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/8fe82d5cf3bd/btad496f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/25657560619e/btad496f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/715f1288c2f7/btad496f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/4651052c5d3a/btad496f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec7/10444965/95081c4d451b/btad496f5.jpg

相似文献

1
Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning.通过知识引导实例生成和提示对比学习的少样本生物医学命名实体识别。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad496.
2
Cross-type biomedical named entity recognition with deep multi-task learning.基于深度多任务学习的跨类型生物医学命名实体识别。
Bioinformatics. 2019 May 15;35(10):1745-1752. doi: 10.1093/bioinformatics/bty869.
3
Augmenting biomedical named entity recognition with general-domain resources.利用通用领域资源增强生物医学命名实体识别。
J Biomed Inform. 2024 Nov;159:104731. doi: 10.1016/j.jbi.2024.104731. Epub 2024 Oct 4.
4
BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.BioByGANS:通过图注意力网络在节点分类框架中融合上下文和句法特征进行生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.
5
Improving few-shot relation extraction through semantics-guided learning.通过语义引导学习提高小样本关系抽取。
Neural Netw. 2024 Jan;169:453-461. doi: 10.1016/j.neunet.2023.10.053. Epub 2023 Nov 3.
6
Improving biomedical named entity recognition by dynamic caching inter-sentence information.通过动态缓存句间信息来改进生物医学命名实体识别。
Bioinformatics. 2022 Aug 10;38(16):3976-3983. doi: 10.1093/bioinformatics/btac422.
7
Advancing entity recognition in biomedicine via instruction tuning of large language models.通过指令调整大型语言模型推进生物医学中的实体识别。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
8
AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning.AIONER:基于整体方案的深度学习生物医学命名实体识别。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad310.
9
A prefix and attention map discrimination fusion guided attention for biomedical named entity recognition.前缀和注意力图判别融合引导的生物医学命名实体识别注意力机制。
BMC Bioinformatics. 2023 Feb 8;24(1):42. doi: 10.1186/s12859-023-05172-9.
10
Towards reliable named entity recognition in the biomedical domain.迈向生物医学领域可靠的命名实体识别
Bioinformatics. 2020 Jan 1;36(1):280-286. doi: 10.1093/bioinformatics/btz504.

引用本文的文献

1
Ontology-conformal recognition of materials entities using language models.使用语言模型对材料实体进行本体共形识别。
Sci Rep. 2025 May 28;15(1):18597. doi: 10.1038/s41598-025-03619-y.
2
Few-shot biomedical NER empowered by LLMs-assisted data augmentation and multi-scale feature extraction.由大语言模型辅助数据增强和多尺度特征提取赋能的少样本生物医学命名实体识别
BioData Min. 2025 Apr 4;18(1):28. doi: 10.1186/s13040-025-00443-y.
3
Learning to explain is a good biomedical few-shot learner.学会解释是一个很好的生物医学小样本学习者。

本文引用的文献

1
Sequence tagging for biomedical extractive question answering.生物医学抽取式问答的序列标注。
Bioinformatics. 2022 Aug 2;38(15):3794-3801. doi: 10.1093/bioinformatics/btac397.
2
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
3
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae589.
4
Prompt Engineering Paradigms for Medical Applications: Scoping Review.医学应用的提示工程范式:范围综述。
J Med Internet Res. 2024 Sep 10;26:e60501. doi: 10.2196/60501.
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
4
The Unified Medical Language System (UMLS): integrating biomedical terminology.统一医学语言系统(UMLS):整合生物医学术语。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.
5
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.生物医学文本到UMLS元词表的有效映射:MetaMap程序
Proc AMIA Symp. 2001:17-21.