文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

BioInstruct:用于生物医学自然语言处理的大型语言模型的指令调整。

BioInstruct: instruction tuning of large language models for biomedical natural language processing.

机构信息

Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States.

Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, United States.

出版信息

J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.


DOI:10.1093/jamia/ocae122
PMID:38833265
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11339494/
Abstract

OBJECTIVES: To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles. MATERIALS AND METHODS: We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance. RESULTS AND DISCUSSION: Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks. CONCLUSION: The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.

摘要

目的:通过引入特定于领域的指令数据集来提高大型语言模型(LLM)在生物医学自然语言处理(BioNLP)中的性能,并研究当与多任务学习原则结合使用时其影响。

材料和方法:我们创建了 BioInstruct,包含 25005 条指令,用于指令调整 LLM(LLaMA 1 和 2,7B 和 13B 版本)。这些指令是通过提示 GPT-4 语言模型,使用从 80 个人工编辑指令中随机抽取的 3 个种子样本生成的。我们采用低秩自适应(LoRA)进行参数高效微调。然后,我们在几个 BioNLP 任务上评估这些经过指令调整的 LLM,这些任务可以分为 3 个主要类别:问答(QA)、信息提取(IE)和文本生成(GEN)。我们还研究了指令的类别(例如,QA、IE 和生成)是否会影响模型性能。

结果和讨论:与未经指令调整的 LLM 相比,我们的指令调整后的 LLM 在性能上有显著提高:在平均准确率方面,QA 提高了 17.3%,IE 提高了 5.7%,生成任务的平均 GPT-4 分数提高了 96%。我们的 7B 参数指令调整后的 LLaMA 1 模型在竞争中表现出色,甚至在其他生物医学领域的 LLaMA 1 微调模型中也表现出色,这些模型使用了大量特定领域的数据或多种任务进行微调。我们的结果还表明,当指令微调与密切相关的任务一起进行时,性能增益显著更高。我们的发现与多任务学习的观察结果一致,表明了两个任务之间的协同作用。

结论:BioInstruct 数据集是一个有价值的资源,指令调整后的 LLM 可实现最佳的 BioNLP 应用。

相似文献

[1]
BioInstruct: instruction tuning of large language models for biomedical natural language processing.

J Am Med Inform Assoc. 2024-9-1

[2]
Advancing entity recognition in biomedicine via instruction tuning of large language models.

Bioinformatics. 2024-3-29

[3]
Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.

JMIR Med Inform. 2025-6-20

[4]
A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025-3-1

[5]
Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.

J Am Med Inform Assoc. 2025-6-1

[6]
Fine-tuning open-source large language models to improve their performance on radiation oncology tasks: A feasibility study to investigate their potential clinical applications in radiation oncology.

Med Phys. 2025-7

[7]
Relation extraction using large language models: a case study on acupuncture point locations.

J Am Med Inform Assoc. 2024-11-1

[8]
Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.

JCO Clin Cancer Inform. 2024-8

[9]
LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction.

J Am Med Inform Assoc. 2024-9-1

[10]
Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.

J Med Internet Res. 2025-7-14

引用本文的文献

[1]
MedVH: Toward Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context.

Adv Intell Syst. 2025-7-21

[2]
Performance of large language models in the differential diagnosis of benign and malignant biliary stricture.

Front Oncol. 2025-7-3

[3]
MedReadCtrl: Personalizing medical text generation with readability-controlled instruction learning.

medRxiv. 2025-7-11

[4]
BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning.

AMIA Jt Summits Transl Sci Proc. 2025-6-10

[5]
Dynamic few-shot prompting for clinical note section classification using lightweight, open-source large language models.

J Am Med Inform Assoc. 2025-7-1

[6]
Clinical insights: A comprehensive review of language models in medicine.

PLOS Digit Health. 2025-5-8

[7]
The Development Landscape of Large Language Models for Biomedical Applications.

Annu Rev Biomed Data Sci. 2025-8

[8]
A novel recommender framework with chatbot to stratify heart attack risk.

Discov Med (Cham). 2024

[9]
Large language models in biomedicine and health: current research landscape and future directions.

J Am Med Inform Assoc. 2024-9-1

本文引用的文献

[1]
PMC-LLaMA: toward building open-source language models for medicine.

J Am Med Inform Assoc. 2024-9-1

[2]
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.

Cureus. 2023-6-24

[3]
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.

NPJ Digit Med. 2021-5-20

[4]
Generating Accurate Electronic Health Assessment from Medical Graph.

Proc Conf Empir Methods Nat Lang Process. 2020-11

[5]
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.

JMIR Med Inform. 2019-9-12

[6]
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020-2-15

[7]
Clinical information extraction applications: A literature review.

J Biomed Inform. 2017-11-21

[8]
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.

BMC Bioinformatics. 2015-4-30

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索