• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Me-LLaMA:用于综合文本分析及其他用途的医学基础大语言模型

Me-LLaMA: Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond.

作者信息

Xie Qianqian, Chen Qingyu, Chen Aokun, Peng Cheng, Hu Yan, Lin Fongci, Peng Xueqing, Huang Jimin, Zhang Jeffrey, Keloth Vipina, Zhou Xinyu, Qian Lingfei, He Huan, Shung Dennis, Ohno-Machado Lucila, Wu Yonghui, Xu Hua, Bian Jiang

机构信息

Yale University.

University of Florida.

出版信息

Res Sq. 2024 Dec 18:rs.3.rs-5456223. doi: 10.21203/rs.3.rs-5456223/v1.

DOI:10.21203/rs.3.rs-5456223/v1
PMID:39764122
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11702801/
Abstract

Recent advancements in large language models (LLMs) like ChatGPT and LLaMA have shown significant potential in medical applications, but their effectiveness is limited by a lack of specialized medical knowledge due to general-domain training. In this study, we developed Me-LLaMA, a new family of open-source medical LLMs that uniquely integrate extensive domain-specific knowledge with robust instruction-following capabilities. Me-LLaMA comprises foundation models (Me-LLaMA 13B and 70B) and their chat-enhanced versions, developed through comprehensive continual pretraining and instruction tuning of LLaMA2 models using both biomedical literature and clinical notes. Me-LLaMA utilized the largest and most comprehensive medical data, including 129B pre-training tokens and 214K instruction tuning samples from diverse biomedical and clinical data sources. Training the 70B models required substantial computational resources, exceeding 100,000 A100 GPU hours. We applied Me-LLaMA to six medical text analysis tasks and evaluated its performance on 12 benchmark datasets. To further assess Me-LLaMA's potential clinical utility, we evaluated its performance on complex clinical case diagnosis compared with other commercial LLMs, using both automatic and human evaluations. Me-LLaMA models outperform LLaMA, and other existing open-source medical LLMs in both zero-shot and supervised learning settings for most text analysis tasks. With task-specific instruction tuning, Me-LLaMA models also surpass leading commercial LLMs, outperforming ChatGPT on 7 out of 8 datasets and GPT-4 on 5 out of 8 datasets. Moreover, Me-LLaMA's performance is comparable to ChatGPT and GPT-4 for diagnosing complex clinical cases. Our findings underscore combining domain-specific continual pretraining with instruction tuning is essential for developing effective domain-specific large language models in healthcare, significantly enhancing performance across diverse medical text analysis tasks and applications. By publicly releasing our models and resources under appropriate user agreements, we aim to foster innovation and facilitate advancements in medical AI, benefiting researchers and practitioners within the community.

摘要

像ChatGPT和LLaMA这样的大语言模型(LLMs)最近取得的进展在医学应用中显示出了巨大潜力,但由于其通用领域训练导致缺乏专业医学知识,其有效性受到限制。在本研究中,我们开发了Me-LLaMA,这是一个新的开源医学大语言模型家族,它独特地将广泛的特定领域知识与强大的指令跟随能力相结合。Me-LLaMA包括基础模型(Me-LLaMA 13B和70B)及其聊天增强版本,通过使用生物医学文献和临床笔记对LLaMA2模型进行全面的持续预训练和指令微调而开发。Me-LLaMA利用了最大且最全面的医学数据,包括来自不同生物医学和临床数据源的129B预训练令牌和214K指令微调样本。训练70B模型需要大量计算资源,超过100,000个A100 GPU小时。我们将Me-LLaMA应用于六个医学文本分析任务,并在12个基准数据集上评估其性能。为了进一步评估Me-LLaMA的潜在临床效用,我们使用自动和人工评估,将其在复杂临床病例诊断中的性能与其他商业大语言模型进行了比较。在大多数文本分析任务的零样本和监督学习设置中,Me-LLaMA模型优于LLaMA和其他现有的开源医学大语言模型。通过特定任务的指令微调,Me-LLaMA模型也超越了领先的商业大语言模型,在8个数据集中的7个上优于ChatGPT,在8个数据集中的5个上优于GPT-4。此外,Me-LLaMA在诊断复杂临床病例方面的性能与ChatGPT和GPT-4相当。我们的研究结果强调,将特定领域的持续预训练与指令微调相结合对于在医疗保健领域开发有效的特定领域大语言模型至关重要,可显著提高在各种医学文本分析任务和应用中的性能。通过在适当的用户协议下公开发布我们的模型和资源,我们旨在促进医学人工智能的创新和发展,使社区内的研究人员和从业者受益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dba0/11702801/ebdb6fb25ab5/nihpp-rs5456223v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dba0/11702801/fdea6ac91bcf/nihpp-rs5456223v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dba0/11702801/019d9bbac16c/nihpp-rs5456223v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dba0/11702801/9e015e3e3e62/nihpp-rs5456223v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dba0/11702801/ebdb6fb25ab5/nihpp-rs5456223v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dba0/11702801/fdea6ac91bcf/nihpp-rs5456223v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dba0/11702801/019d9bbac16c/nihpp-rs5456223v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dba0/11702801/9e015e3e3e62/nihpp-rs5456223v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dba0/11702801/ebdb6fb25ab5/nihpp-rs5456223v1-f0004.jpg

相似文献

1
Me-LLaMA: Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond.Me-LLaMA:用于综合文本分析及其他用途的医学基础大语言模型
Res Sq. 2024 Dec 18:rs.3.rs-5456223. doi: 10.21203/rs.3.rs-5456223/v1.
2
Me-LLaMA: Foundation Large Language Models for Medical Applications.Me-LLaMA:用于医学应用的基础大语言模型。
Res Sq. 2024 May 22:rs.3.rs-4240043. doi: 10.21203/rs.3.rs-4240043/v1.
3
Medical foundation large language models for comprehensive text analysis and beyond.用于综合文本分析及其他用途的医学基础大语言模型。
NPJ Digit Med. 2025 Mar 5;8(1):141. doi: 10.1038/s41746-025-01533-1.
4
Advancing entity recognition in biomedicine via instruction tuning of large language models.通过指令调整大型语言模型推进生物医学中的实体识别。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
5
PMC-LLaMA: toward building open-source language models for medicine.PMC-LLaMA:为医学构建开源语言模型的努力。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1833-1843. doi: 10.1093/jamia/ocae045.
6
EYE-Llama, an in-domain large language model for ophthalmology.EYE-Llama,一种用于眼科领域的大语言模型。
bioRxiv. 2024 Apr 29:2024.04.26.591355. doi: 10.1101/2024.04.26.591355.
7
A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks.对基准生物医学文本处理任务中大型语言模型的全面评估。
Comput Biol Med. 2024 Mar;171:108189. doi: 10.1016/j.compbiomed.2024.108189. Epub 2024 Feb 20.
8
Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning.用于文本标注的开源语言模型:模型设置与微调实用指南。
J Comput Soc Sci. 2025;8(1):17. doi: 10.1007/s42001-024-00345-9. Epub 2024 Dec 18.
9
BioInstruct: instruction tuning of large language models for biomedical natural language processing.BioInstruct:用于生物医学自然语言处理的大型语言模型的指令调整。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.
10
Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.评估生物医学微调对大语言模型在临床任务上的有效性。
J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.

本文引用的文献

1
Towards accurate differential diagnosis with large language models.迈向使用大语言模型进行准确的鉴别诊断。
Nature. 2025 Apr 9. doi: 10.1038/s41586-025-08869-4.
2
Improving large language models for clinical named entity recognition via prompt engineering.通过提示工程改进临床命名实体识别的大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1812-1820. doi: 10.1093/jamia/ocad259.
3
Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine.诊断推理提示揭示了医学中大型语言模型可解释性的潜力。
NPJ Digit Med. 2024 Jan 24;7(1):20. doi: 10.1038/s41746-024-01010-1.
4
A study of generative large language model for medical research and healthcare.一项关于用于医学研究和医疗保健的生成式大语言模型的研究。
NPJ Digit Med. 2023 Nov 16;6(1):210. doi: 10.1038/s41746-023-00958-w.
5
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.ChatDoctor:一种基于医学领域知识对大型语言模型Meta-AI(LLaMA)进行微调的医学聊天模型。
Cureus. 2023 Jun 24;15(6):e40895. doi: 10.7759/cureus.40895. eCollection 2023 Jun.
6
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
7
Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge.生成式人工智能模型在复杂诊断挑战中的准确性。
JAMA. 2023 Jul 3;330(1):78-80. doi: 10.1001/jama.2023.8288.
8
MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.MIMIC-CXR,一个去标识化的、公开可用的、包含自由文本报告的胸部 X 光数据库。
Sci Data. 2019 Dec 12;6(1):317. doi: 10.1038/s41597-019-0322-0.
9
Bridging the Gap Between Consumers' Medication Questions and Trusted Answers.弥合消费者用药问题与可靠答案之间的差距。
Stud Health Technol Inform. 2019 Aug 21;264:25-29. doi: 10.3233/SHTI190176.
10
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.