Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States.
Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, United States.
J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.
OBJECTIVES: To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles. MATERIALS AND METHODS: We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance. RESULTS AND DISCUSSION: Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks. CONCLUSION: The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.
目的:通过引入特定于领域的指令数据集来提高大型语言模型(LLM)在生物医学自然语言处理(BioNLP)中的性能,并研究当与多任务学习原则结合使用时其影响。
材料和方法:我们创建了 BioInstruct,包含 25005 条指令,用于指令调整 LLM(LLaMA 1 和 2,7B 和 13B 版本)。这些指令是通过提示 GPT-4 语言模型,使用从 80 个人工编辑指令中随机抽取的 3 个种子样本生成的。我们采用低秩自适应(LoRA)进行参数高效微调。然后,我们在几个 BioNLP 任务上评估这些经过指令调整的 LLM,这些任务可以分为 3 个主要类别:问答(QA)、信息提取(IE)和文本生成(GEN)。我们还研究了指令的类别(例如,QA、IE 和生成)是否会影响模型性能。
结果和讨论:与未经指令调整的 LLM 相比,我们的指令调整后的 LLM 在性能上有显著提高:在平均准确率方面,QA 提高了 17.3%,IE 提高了 5.7%,生成任务的平均 GPT-4 分数提高了 96%。我们的 7B 参数指令调整后的 LLaMA 1 模型在竞争中表现出色,甚至在其他生物医学领域的 LLaMA 1 微调模型中也表现出色,这些模型使用了大量特定领域的数据或多种任务进行微调。我们的结果还表明,当指令微调与密切相关的任务一起进行时,性能增益显著更高。我们的发现与多任务学习的观察结果一致,表明了两个任务之间的协同作用。
结论:BioInstruct 数据集是一个有价值的资源,指令调整后的 LLM 可实现最佳的 BioNLP 应用。
J Am Med Inform Assoc. 2024-9-1
Bioinformatics. 2024-3-29
J Am Med Inform Assoc. 2025-3-1
J Am Med Inform Assoc. 2025-6-1
J Am Med Inform Assoc. 2024-11-1
J Am Med Inform Assoc. 2024-9-1
J Med Internet Res. 2025-7-14
AMIA Jt Summits Transl Sci Proc. 2025-6-10
J Am Med Inform Assoc. 2025-7-1
PLOS Digit Health. 2025-5-8
Annu Rev Biomed Data Sci. 2025-8
Discov Med (Cham). 2024
J Am Med Inform Assoc. 2024-9-1
J Am Med Inform Assoc. 2024-9-1
Proc Conf Empir Methods Nat Lang Process. 2020-11
Bioinformatics. 2020-2-15
J Biomed Inform. 2017-11-21
BMC Bioinformatics. 2015-4-30