基于检索增强生成的动态提示，用于使用大语言模型进行少样本生物医学命名实体识别

Retrieval augmented generation based dynamic prompting for few-shot biomedical named entity recognition using large language models.

作者信息

Ge Yao, Das Sudeshna, Guo Yuting, Sarker Abeed

机构信息

Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA.

Department of Computer Science, Emory University, Atlanta, GA, USA.

出版信息

Res Sq. 2025 Aug 25:rs.3.rs-7216581. doi: 10.21203/rs.3.rs-7216581/v1.

DOI:10.21203/rs.3.rs-7216581/v1

PMID:40909790

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12408026/

Abstract

Biomedical named entity recognition (NER) is a high-utility natural language processing (NLP) task, and large language models (LLMs) show promise particularly in few-shot settings (i.e., limited training data). In this article, we address the performance challenges of LLMs for few-shot biomedical NER by investigating a dynamic prompting strategy involving retrieval-augmented generation (RAG). In our approach, the annotated in-context learning examples are selected based on their similarities with the input texts, and the prompt is dynamically updated for each instance during inference. We implemented and optimized static and dynamic prompt engineering techniques and evaluated them on five biomedical NER datasets. Static prompting with structured components increased average F-scores by 12% for GPT-4, and 11% for GPT-3.5 and LLaMA 3-70B, relative to basic static prompting. Dynamic prompting further improved performance, with TF-IDF and SBERT retrieval methods yielding the best results, improving average F-scores by 7.3% and 5.6% in 5-shot and 10-shot settings, respectively. These findings highlight the utility of contextually adaptive prompts via RAG for biomedical NER.

摘要

生物医学命名实体识别（NER）是一项具有高实用价值的自然语言处理（NLP）任务，而大语言模型（LLMs）尤其在少样本设置（即有限的训练数据）中显示出潜力。在本文中，我们通过研究一种涉及检索增强生成（RAG）的动态提示策略，来解决大语言模型在少样本生物医学NER方面的性能挑战。在我们的方法中，基于与输入文本的相似性选择带注释的上下文学习示例，并在推理过程中为每个实例动态更新提示。我们实现并优化了静态和动态提示工程技术，并在五个生物医学NER数据集上对其进行了评估。相对于基本的静态提示，使用结构化组件的静态提示使GPT-4的平均F分数提高了12%，使GPT-3.5和LLaMA 3-70B的平均F分数提高了11%。动态提示进一步提高了性能，TF-IDF和SBERT检索方法产生了最佳结果，在5样本和10样本设置中，平均F分数分别提高了7.3%和5.6%。这些发现突出了通过RAG进行上下文自适应提示在生物医学NER中的实用性。