BioInstruct：用于生物医学自然语言处理的大型语言模型的指令调整。

BioInstruct: instruction tuning of large language models for biomedical natural language processing.

机构信息

Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States.

Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, United States.

出版信息

J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.

DOI:10.1093/jamia/ocae122

PMID:38833265

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11339494/

Abstract

OBJECTIVES

To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.

MATERIALS AND METHODS

We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.

RESULTS AND DISCUSSION

Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.

CONCLUSION

The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.

摘要

目的

通过引入特定于领域的指令数据集来提高大型语言模型（LLM）在生物医学自然语言处理（BioNLP）中的性能，并研究当与多任务学习原则结合使用时其影响。

材料和方法

我们创建了 BioInstruct，包含 25005 条指令，用于指令调整 LLM（LLaMA 1 和 2，7B 和 13B 版本）。这些指令是通过提示 GPT-4 语言模型，使用从 80 个人工编辑指令中随机抽取的 3 个种子样本生成的。我们采用低秩自适应（LoRA）进行参数高效微调。然后，我们在几个 BioNLP 任务上评估这些经过指令调整的 LLM，这些任务可以分为 3 个主要类别：问答（QA）、信息提取（IE）和文本生成（GEN）。我们还研究了指令的类别（例如，QA、IE 和生成）是否会影响模型性能。

结果和讨论

与未经指令调整的 LLM 相比，我们的指令调整后的 LLM 在性能上有显著提高：在平均准确率方面，QA 提高了 17.3%，IE 提高了 5.7%，生成任务的平均 GPT-4 分数提高了 96%。我们的 7B 参数指令调整后的 LLaMA 1 模型在竞争中表现出色，甚至在其他生物医学领域的 LLaMA 1 微调模型中也表现出色，这些模型使用了大量特定领域的数据或多种任务进行微调。我们的结果还表明，当指令微调与密切相关的任务一起进行时，性能增益显著更高。我们的发现与多任务学习的观察结果一致，表明了两个任务之间的协同作用。

结论

BioInstruct 数据集是一个有价值的资源，指令调整后的 LLM 可实现最佳的 BioNLP 应用。

相似文献

BioInstruct: instruction tuning of large language models for biomedical natural language processing.

J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.

Advancing entity recognition in biomedicine via instruction tuning of large language models.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.

JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.

A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.

J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.

Fine-tuning open-source large language models to improve their performance on radiation oncology tasks: A feasibility study to investigate their potential clinical applications in radiation oncology.

Med Phys. 2025 Jul;52(7):e17985. doi: 10.1002/mp.17985.

Relation extraction using large language models: a case study on acupuncture point locations.

J Am Med Inform Assoc. 2024 Nov 1;31(11):2622-2631. doi: 10.1093/jamia/ocae233.

Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.

JCO Clin Cancer Inform. 2024 Aug;8:e2300258. doi: 10.1200/CCI.23.00258.

LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction.

J Am Med Inform Assoc. 2024 Sep 1;31(9):2010-2018. doi: 10.1093/jamia/ocae147.

Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.

J Med Internet Res. 2025 Jul 14;27:e70080. doi: 10.2196/70080.

引用本文的文献

MedVH: Toward Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context.

Adv Intell Syst. 2025 Jul 21. doi: 10.1002/aisy.202500255.

Performance of large language models in the differential diagnosis of benign and malignant biliary stricture.

Front Oncol. 2025 Jul 3;15:1613818. doi: 10.3389/fonc.2025.1613818. eCollection 2025.

MedReadCtrl: Personalizing medical text generation with readability-controlled instruction learning.

medRxiv. 2025 Jul 11:2025.07.09.25331239. doi: 10.1101/2025.07.09.25331239.

BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning.

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:149-158. eCollection 2025.

Dynamic few-shot prompting for clinical note section classification using lightweight, open-source large language models.

J Am Med Inform Assoc. 2025 Jul 1;32(7):1164-1173. doi: 10.1093/jamia/ocaf084.

Clinical insights: A comprehensive review of language models in medicine.

PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.

The Development Landscape of Large Language Models for Biomedical Applications.

Annu Rev Biomed Data Sci. 2025 Aug;8(1):251-274. doi: 10.1146/annurev-biodatasci-102224-074736. Epub 2025 Apr 1.

A novel recommender framework with chatbot to stratify heart attack risk.

Discov Med (Cham). 2024;1(1):161. doi: 10.1007/s44337-024-00174-9. Epub 2024 Dec 17.

Large language models in biomedicine and health: current research landscape and future directions.

J Am Med Inform Assoc. 2024 Sep 1;31(9):1801-1811. doi: 10.1093/jamia/ocae202.

本文引用的文献

PMC-LLaMA: toward building open-source language models for medicine.

J Am Med Inform Assoc. 2024 Sep 1;31(9):1833-1843. doi: 10.1093/jamia/ocae045.

ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.

Cureus. 2023 Jun 24;15(6):e40895. doi: 10.7759/cureus.40895. eCollection 2023 Jun.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.

NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.

Generating Accurate Electronic Health Assessment from Medical Graph.

Proc Conf Empir Methods Nat Lang Process. 2020 Nov;2020:3764-3773. doi: 10.18653/v1/2020.findings-emnlp.336.

Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.

JMIR Med Inform. 2019 Sep 12;7(3):e14830. doi: 10.2196/14830.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Clinical information extraction applications: A literature review.

J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.

BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

BioInstruct：用于生物医学自然语言处理的大型语言模型的指令调整。

BioInstruct: instruction tuning of large language models for biomedical natural language processing.

机构信息

Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States.

Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, United States.

出版信息

J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.

DOI:10.1093/jamia/ocae122

PMID:38833265

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11339494/

Abstract

OBJECTIVES

MATERIALS AND METHODS

RESULTS AND DISCUSSION

CONCLUSION

The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.

摘要

目的

材料和方法

结果和讨论

结论

BioInstruct 数据集是一个有价值的资源，指令调整后的 LLM 可实现最佳的 BioNLP 应用。

BioInstruct：用于生物医学自然语言处理的大型语言模型的指令调整。

BioInstruct: instruction tuning of large language models for biomedical natural language processing.

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS AND DISCUSSION

CONCLUSION

目的

材料和方法

结果和讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

BioInstruct：用于生物医学自然语言处理的大型语言模型的指令调整。

BioInstruct: instruction tuning of large language models for biomedical natural language processing.

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS AND DISCUSSION

CONCLUSION

目的

材料和方法

结果和讨论

结论