Suppr超能文献

用于综合文本分析及其他用途的医学基础大语言模型。

Medical foundation large language models for comprehensive text analysis and beyond.

作者信息

Xie Qianqian, Chen Qingyu, Chen Aokun, Peng Cheng, Hu Yan, Lin Fongci, Peng Xueqing, Huang Jimin, Zhang Jeffrey, Keloth Vipina, Zhou Xinyu, Qian Lingfei, He Huan, Shung Dennis, Ohno-Machado Lucila, Wu Yonghui, Xu Hua, Bian Jiang

机构信息

Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.

Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.

出版信息

NPJ Digit Med. 2025 Mar 5;8(1):141. doi: 10.1038/s41746-025-01533-1.

Abstract

Recent advancements in large language models (LLMs) show significant potential in medical applications but are hindered by limited specialized medical knowledge. We present Me-LLaMA, a family of open-source medical LLMs integrating extensive domain-specific knowledge with robust instruction-following capabilities. Me-LLaMA is developed through continual pretraining and instruction tuning of LLaMA2 models using diverse biomedical and clinical data sources (e.g., biomedical literature and clinical notes). We evaluated Me-LLaMA on six text analysis tasks using 12 benchmarks (e.g., PubMedQA and MIMIC-CXR) and assessed its clinical utility in complex case diagnosis through automatic and human evaluations. Me-LLaMA outperforms existing open medical LLMs in zero-shot and supervised settings and surpasses ChatGPT and GPT-4 after task-specific instruction tuning for most text analysis tasks. Its performance is also comparable to ChatGPT and GPT-4 for diagnosing complex clinical cases. Our findings highlight the importance of combining domain-specific continual pretraining with instruction tuning to enhance performance in medical LLMs.

摘要

大型语言模型(LLMs)的最新进展显示出在医学应用中的巨大潜力,但受到专业医学知识有限的阻碍。我们展示了Me-LLaMA,这是一个开源医学语言模型家族,它将广泛的特定领域知识与强大的指令跟随能力相结合。Me-LLaMA是通过使用不同的生物医学和临床数据源(如生物医学文献和临床笔记)对LLaMA2模型进行持续预训练和指令微调而开发的。我们使用12个基准(如PubMedQA和MIMIC-CXR)在六个文本分析任务上评估了Me-LLaMA,并通过自动和人工评估评估了其在复杂病例诊断中的临床效用。在零样本和监督设置下,Me-LLaMA优于现有的开放医学语言模型,并且在针对大多数文本分析任务进行特定任务指令微调后,超过了ChatGPT和GPT-4。在诊断复杂临床病例方面,其性能也与ChatGPT和GPT-4相当。我们的研究结果强调了将特定领域的持续预训练与指令微调相结合以提高医学语言模型性能的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09b0/11882967/f8771c9c73ff/41746_2025_1533_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验