用于疾病诊断辅助的通用医学语言模型。

A generalist medical language model for disease diagnosis assistance.

作者信息

Liu Xiaohong, Liu Hao, Yang Guoxing, Jiang Zeyu, Cui Shuguang, Zhang Zhaoze, Wang Huan, Tao Liyuan, Sun Yongchang, Song Zhu, Hong Tianpei, Yang Jin, Gao Tianrun, Zhang Jiangjiang, Li Xiaohu, Zhang Jing, Sang Ye, Yang Zhao, Xue Kanmin, Wu Song, Zhang Ping, Yang Jian, Song Chunli, Wang Guangyu

机构信息

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China.

Department of Orthopedics, Peking University Third Hospital & Beijing Key Laboratory of Spinal Disease & Engineering Research Center of Bone and Joint Precision Medicine, Beijing, China.

出版信息

Nat Med. 2025 Mar;31(3):932-942. doi: 10.1038/s41591-024-03416-6. Epub 2025 Jan 8.

Abstract

The delivery of accurate diagnoses is crucial in healthcare and represents the gateway to appropriate and timely treatment. Although recent large language models (LLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, their effectiveness in clinical diagnosis remains unproven. Here we present MedFound, a generalist medical language model with 176 billion parameters, pre-trained on a large-scale corpus derived from diverse medical text and real-world clinical records. We further fine-tuned MedFound to learn physicians' inferential diagnosis with a self-bootstrapping strategy-based chain-of-thought approach and introduced a unified preference alignment framework to align it with standard clinical practice. Extensive experiments demonstrate that our medical LLM outperforms other baseline LLMs and specialized models in in-distribution (common diseases), out-of-distribution (external validation) and long-tailed distribution (rare diseases) scenarios across eight specialties. Further ablation studies indicate the effectiveness of key components in our medical LLM training approach. We conducted a comprehensive evaluation of the clinical applicability of LLMs for diagnosis involving artificial intelligence (AI) versus physician comparison, AI-assistance study and human evaluation framework. Our proposed framework incorporates eight clinical evaluation metrics, covering capabilities such as medical record summarization, diagnostic reasoning and risk management. Our findings demonstrate the model's feasibility in assisting physicians with disease diagnosis as part of the clinical workflow.

摘要

准确诊断在医疗保健中至关重要,是获得恰当及时治疗的关键。尽管最近的大语言模型(LLMs)在少样本或零样本学习中展现出了令人印象深刻的能力,但其在临床诊断中的有效性仍未得到证实。在此,我们展示了MedFound,这是一个拥有1760亿参数的通用医学语言模型,在从各种医学文本和真实世界临床记录中提取的大规模语料库上进行了预训练。我们进一步对MedFound进行微调,通过基于自训练策略的思维链方法来学习医生的推断性诊断,并引入了一个统一的偏好对齐框架,使其与标准临床实践保持一致。大量实验表明,我们的医学大语言模型在八个专业的分布内(常见疾病)、分布外(外部验证)和长尾分布(罕见疾病)场景中均优于其他基线大语言模型和专业模型。进一步的消融研究表明了我们医学大语言模型训练方法中关键组件的有效性。我们对大语言模型用于诊断的临床适用性进行了全面评估,包括人工智能(AI)与医生的比较、AI辅助研究以及人类评估框架。我们提出的框架纳入了八个临床评估指标,涵盖病历总结、诊断推理和风险管理等能力。我们的研究结果证明了该模型作为临床工作流程一部分协助医生进行疾病诊断的可行性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索