Suppr超能文献

BioInstruct:用于生物医学自然语言处理的大型语言模型的指令调整。

BioInstruct: instruction tuning of large language models for biomedical natural language processing.

机构信息

Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States.

Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, United States.

出版信息

J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.

Abstract

OBJECTIVES

To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.

MATERIALS AND METHODS

We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.

RESULTS AND DISCUSSION

Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.

CONCLUSION

The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.

摘要

目的

通过引入特定于领域的指令数据集来提高大型语言模型(LLM)在生物医学自然语言处理(BioNLP)中的性能,并研究当与多任务学习原则结合使用时其影响。

材料和方法

我们创建了 BioInstruct,包含 25005 条指令,用于指令调整 LLM(LLaMA 1 和 2,7B 和 13B 版本)。这些指令是通过提示 GPT-4 语言模型,使用从 80 个人工编辑指令中随机抽取的 3 个种子样本生成的。我们采用低秩自适应(LoRA)进行参数高效微调。然后,我们在几个 BioNLP 任务上评估这些经过指令调整的 LLM,这些任务可以分为 3 个主要类别:问答(QA)、信息提取(IE)和文本生成(GEN)。我们还研究了指令的类别(例如,QA、IE 和生成)是否会影响模型性能。

结果和讨论

与未经指令调整的 LLM 相比,我们的指令调整后的 LLM 在性能上有显著提高:在平均准确率方面,QA 提高了 17.3%,IE 提高了 5.7%,生成任务的平均 GPT-4 分数提高了 96%。我们的 7B 参数指令调整后的 LLaMA 1 模型在竞争中表现出色,甚至在其他生物医学领域的 LLaMA 1 微调模型中也表现出色,这些模型使用了大量特定领域的数据或多种任务进行微调。我们的结果还表明,当指令微调与密切相关的任务一起进行时,性能增益显著更高。我们的发现与多任务学习的观察结果一致,表明了两个任务之间的协同作用。

结论

BioInstruct 数据集是一个有价值的资源,指令调整后的 LLM 可实现最佳的 BioNLP 应用。

相似文献

引用本文的文献

6
Clinical insights: A comprehensive review of language models in medicine.临床见解:医学领域语言模型的全面综述
PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.
7
The Development Landscape of Large Language Models for Biomedical Applications.用于生物医学应用的大语言模型的发展态势
Annu Rev Biomed Data Sci. 2025 Aug;8(1):251-274. doi: 10.1146/annurev-biodatasci-102224-074736. Epub 2025 Apr 1.

本文引用的文献

4
Generating Accurate Electronic Health Assessment from Medical Graph.从医学图谱生成准确的电子健康评估。
Proc Conf Empir Methods Nat Lang Process. 2020 Nov;2020:3764-3773. doi: 10.18653/v1/2020.findings-emnlp.336.
7
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验