• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning.BioMistral-NLU:通过指令微调实现更具通用性的医学语言理解
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:149-158. eCollection 2025.
2
Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.评估和提高大语言模型中的辨证思维能力:方法开发研究
JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.
3
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
4
Fine-tuning medical language models for enhanced long-contextual understanding and domain expertise.微调医学语言模型以增强长上下文理解和领域专业知识。
Quant Imaging Med Surg. 2025 Jun 6;15(6):5450-5462. doi: 10.21037/qims-2024-2655. Epub 2025 Jun 3.
5
Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.评估生物医学微调对大语言模型在临床任务上的有效性。
J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.
6
The first step is the hardest: pitfalls of representing and tokenizing temporal data for large language models.第一步是最困难的:为大型语言模型表示和标记时间数据的陷阱。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2151-2158. doi: 10.1093/jamia/ocae090.
7
LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction.LEAP:用于生物医学关系抽取的 LLM 指令-示例自适应提示框架。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2010-2018. doi: 10.1093/jamia/ocae147.
8
Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.基于开源大语言模型的乳腺癌治疗后患者为中心结局自动提取工具包。
JCO Clin Cancer Inform. 2024 Aug;8:e2300258. doi: 10.1200/CCI.23.00258.
9
Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.利用大型语言模型从用户生成的日记文本数据中检测抑郁,作为数字心理健康筛查的新方法:仪器验证研究。
J Med Internet Res. 2024 Sep 18;26:e54617. doi: 10.2196/54617.
10
Stench of Errors or the Shine of Potential: The Challenge of (Ir)Responsible Use of ChatGPT in Speech-Language Pathology.错误的恶臭还是潜力的光辉:言语病理学中(不)负责任地使用ChatGPT的挑战。
Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70088. doi: 10.1111/1460-6984.70088.

本文引用的文献

1
Exploring the effectiveness of instruction tuning in biomedical language processing.探索指令调优在生物医学语言处理中的有效性。
Artif Intell Med. 2024 Dec;158:103007. doi: 10.1016/j.artmed.2024.103007. Epub 2024 Nov 7.
2
BioInstruct: instruction tuning of large language models for biomedical natural language processing.BioInstruct:用于生物医学自然语言处理的大型语言模型的指令调整。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.
3
PMC-LLaMA: toward building open-source language models for medicine.PMC-LLaMA:为医学构建开源语言模型的努力。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1833-1843. doi: 10.1093/jamia/ocae045.
4
Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks.太乙:一个用于多种生物医学任务的双语精调大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1865-1874. doi: 10.1093/jamia/ocae037.
5
Evaluating the ChatGPT family of models for biomedical reasoning and classification.评估ChatGPT系列模型在生物医学推理和分类方面的表现。
J Am Med Inform Assoc. 2024 Apr 3;31(4):940-948. doi: 10.1093/jamia/ocad256.
6
An extensive benchmark study on biomedical text generation and mining with ChatGPT.一项关于使用ChatGPT进行生物医学文本生成和挖掘的广泛基准研究。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad557.
7
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.ChatDoctor:一种基于医学领域知识对大型语言模型Meta-AI(LLaMA)进行微调的医学聊天模型。
Cureus. 2023 Jun 24;15(6):e40895. doi: 10.7759/cureus.40895. eCollection 2023 Jun.
8
Deep learning in clinical natural language processing: a methodical review.深度学习在临床自然语言处理中的应用:系统综述。
J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200.
9
Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature.临床笔记与生物医学文献中自然语言处理提取概念的语义特征。
AMIA Annu Symp Proc. 2011;2011:1550-8. Epub 2011 Oct 22.

BioMistral-NLU:通过指令微调实现更具通用性的医学语言理解

BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning.

作者信息

Fu Yujuan Velvin, Ramachandran Giridhar Kaushik, Park Namu, Lybarger Kevin, Xia Fei, Uzuner Ozlem, Yetisgen Meliha

机构信息

University of Washington, Seattle, WA, USA.

George Mason University, Fairfax, VA, USA.

出版信息

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:149-158. eCollection 2025.

PMID:40502228
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12150714/
Abstract

Large language models (LLMs) such as ChatGPT are fine-tuned on large and diverse instruction-following corpora, and can generalize to new tasks. However, those instruction-tuned LLMs often perform poorly in specialized medical natural language understanding (NLU) tasks that require domain knowledge, granular text comprehension, and structured data extraction. To bridge the gap, we: (1) propose a unified prompting format for 7 important NLU tasks, (2) curate an instruction-tuning dataset, MNLU-Instruct, utilizing diverse existing open-source medical NLU corpora, and (3) develop BioMistral-NLU, a generalizable medical NLU model, through fine-tuning BioMistral on MNLU-Instruct. We evaluate BioMistral-NLU in a zero-shot setting, across 6 important NLU tasks, from two widely adopted medical NLU benchmarks: BLUE and BLURB. Our experiments show that our BioMistral-NLU outperforms the original BioMistral, as well as the proprietary LLMs - ChatGPT and GPT-4. Our dataset-agnostic prompting strategy and instruction tuning step over diverse NLU tasks enhance LLMs' generalizability across diverse medical NLU tasks. Our ablation experiments show that instruction-tuning on a wider variety of tasks, even when the total number of training instances remains constant, enhances downstream zero-shot generalization.

摘要

诸如ChatGPT之类的大语言模型(LLMs)是在大量且多样的遵循指令的语料库上进行微调的,并且可以推广到新任务。然而,那些经过指令微调的LLMs在需要领域知识、细致的文本理解和结构化数据提取的专业医学自然语言理解(NLU)任务中,往往表现不佳。为了弥合这一差距,我们:(1)为7个重要的NLU任务提出了一种统一的提示格式;(2)利用各种现有的开源医学NLU语料库,精心策划了一个指令微调数据集MNLU-Instruct;(3)通过在MNLU-Instruct上对BioMistral进行微调,开发了一种可推广的医学NLU模型BioMistral-NLU。我们在零样本设置下,针对来自两个广泛采用的医学NLU基准BLUE和BLURB的6个重要NLU任务,对BioMistral-NLU进行了评估。我们的实验表明,我们的BioMistral-NLU优于原始的BioMistral,以及专有LLMs——ChatGPT和GPT-4。我们与数据集无关的提示策略以及针对不同NLU任务的指令微调步骤,增强了LLMs在各种医学NLU任务中的通用性。我们的消融实验表明,即使训练实例的总数保持不变,在更广泛的各种任务上进行指令微调也能增强下游零样本泛化能力。