用于疾病诊断辅助的通用医学语言模型。

A generalist medical language model for disease diagnosis assistance.

作者信息

Liu Xiaohong, Liu Hao, Yang Guoxing, Jiang Zeyu, Cui Shuguang, Zhang Zhaoze, Wang Huan, Tao Liyuan, Sun Yongchang, Song Zhu, Hong Tianpei, Yang Jin, Gao Tianrun, Zhang Jiangjiang, Li Xiaohu, Zhang Jing, Sang Ye, Yang Zhao, Xue Kanmin, Wu Song, Zhang Ping, Yang Jian, Song Chunli, Wang Guangyu

机构信息

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China.

Department of Orthopedics, Peking University Third Hospital & Beijing Key Laboratory of Spinal Disease & Engineering Research Center of Bone and Joint Precision Medicine, Beijing, China.

出版信息

Nat Med. 2025 Mar;31(3):932-942. doi: 10.1038/s41591-024-03416-6. Epub 2025 Jan 8.

DOI:10.1038/s41591-024-03416-6

PMID:39779927

Abstract

The delivery of accurate diagnoses is crucial in healthcare and represents the gateway to appropriate and timely treatment. Although recent large language models (LLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, their effectiveness in clinical diagnosis remains unproven. Here we present MedFound, a generalist medical language model with 176 billion parameters, pre-trained on a large-scale corpus derived from diverse medical text and real-world clinical records. We further fine-tuned MedFound to learn physicians' inferential diagnosis with a self-bootstrapping strategy-based chain-of-thought approach and introduced a unified preference alignment framework to align it with standard clinical practice. Extensive experiments demonstrate that our medical LLM outperforms other baseline LLMs and specialized models in in-distribution (common diseases), out-of-distribution (external validation) and long-tailed distribution (rare diseases) scenarios across eight specialties. Further ablation studies indicate the effectiveness of key components in our medical LLM training approach. We conducted a comprehensive evaluation of the clinical applicability of LLMs for diagnosis involving artificial intelligence (AI) versus physician comparison, AI-assistance study and human evaluation framework. Our proposed framework incorporates eight clinical evaluation metrics, covering capabilities such as medical record summarization, diagnostic reasoning and risk management. Our findings demonstrate the model's feasibility in assisting physicians with disease diagnosis as part of the clinical workflow.

摘要

准确诊断在医疗保健中至关重要，是获得恰当及时治疗的关键。尽管最近的大语言模型（LLMs）在少样本或零样本学习中展现出了令人印象深刻的能力，但其在临床诊断中的有效性仍未得到证实。在此，我们展示了MedFound，这是一个拥有1760亿参数的通用医学语言模型，在从各种医学文本和真实世界临床记录中提取的大规模语料库上进行了预训练。我们进一步对MedFound进行微调，通过基于自训练策略的思维链方法来学习医生的推断性诊断，并引入了一个统一的偏好对齐框架，使其与标准临床实践保持一致。大量实验表明，我们的医学大语言模型在八个专业的分布内（常见疾病）、分布外（外部验证）和长尾分布（罕见疾病）场景中均优于其他基线大语言模型和专业模型。进一步的消融研究表明了我们医学大语言模型训练方法中关键组件的有效性。我们对大语言模型用于诊断的临床适用性进行了全面评估，包括人工智能（AI）与医生的比较、AI辅助研究以及人类评估框架。我们提出的框架纳入了八个临床评估指标，涵盖病历总结、诊断推理和风险管理等能力。我们的研究结果证明了该模型作为临床工作流程一部分协助医生进行疾病诊断的可行性。

相似文献

A generalist medical language model for disease diagnosis assistance.用于疾病诊断辅助的通用医学语言模型。

Nat Med. 2025 Mar;31(3):932-942. doi: 10.1038/s41591-024-03416-6. Epub 2025 Jan 8.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断：对流行的大型语言模型的定性研究。

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用：比较研究。

J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.

Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study.将医学知识图谱融入大语言模型进行诊断预测：设计与应用研究

JMIR AI. 2025 Feb 24;4:e58670. doi: 10.2196/58670.

A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试，采用了适配的大语言模型。

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

Large Language Model-Based Assessment of Clinical Reasoning Documentation in the Electronic Health Record Across Two Institutions: Development and Validation Study.基于大语言模型对两个机构电子健康记录中临床推理文档的评估：开发与验证研究

J Med Internet Res. 2025 Mar 21;27:e67967. doi: 10.2196/67967.

Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments.大型语言模型在命名实体识别中的性能与可重复性：在受控环境中使用的考量

Drug Saf. 2025 Mar;48(3):287-303. doi: 10.1007/s40264-024-01499-1. Epub 2024 Dec 11.

Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment.评估和优化用于脊柱关节炎多项选择题回答的大型语言模型：增强和评估的方案。

JMIR Res Protoc. 2024 May 24;13:e57001. doi: 10.2196/57001.

Generative Large Language Model-Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19.用于个性化风险评估的生成式大语言模型驱动的对话式人工智能应用程序：COVID-19案例研究

JMIR AI. 2025 Mar 27;4:e67363. doi: 10.2196/67363.

Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study.揭开高级人工智能语言模型在去识别汉英混合临床文本背后的秘密：开发与验证研究。

J Med Internet Res. 2024 Jan 25;26:e48443. doi: 10.2196/48443.

引用本文的文献

Large Language Models for Rare Disease Diagnosis at the Undiagnosed Diseases Network.未确诊疾病网络中用于罕见病诊断的大语言模型

JAMA Netw Open. 2025 Aug 1;8(8):e2528538. doi: 10.1001/jamanetworkopen.2025.28538.

Applying large language model for automated quality scoring of radiology requisitions using a standardized criteria.使用标准化标准将大语言模型应用于放射检查申请的自动质量评分。

Eur Radiol. 2025 Aug 20. doi: 10.1007/s00330-025-11933-2.

Multi-task meta-attention network for traditional Chinese medicine diagnostic recommendation.用于中医诊断推荐的多任务元注意力网络。

Front Public Health. 2025 Aug 1;13:1549679. doi: 10.3389/fpubh.2025.1549679. eCollection 2025.

Reliability of large language models for reviewing research with artificial intelligence in cardiac electrophysiology using the European Heart Rhythm Association artificial intelligence checklist.使用欧洲心律协会人工智能检查表，大型语言模型对心脏电生理领域人工智能辅助研究综述的可靠性。

Europace. 2025 Aug 4;27(8). doi: 10.1093/europace/euaf173.

Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.通过与放射学指南的语言模型对齐来评估真实世界患者病例的急性影像检查单开具情况。

Commun Med (Lond). 2025 Aug 4;5(1):332. doi: 10.1038/s43856-025-01061-9.

Rapid deployment of large language model DeepSeek in Chinese hospitals demands a regulatory response.大语言模型深搜在中国医院的快速部署需要监管回应。

Nat Med. 2025 Jul 30. doi: 10.1038/s41591-025-03836-y.

A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.牙种植学中大型语言模型的多维性能评估：ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较

BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.

Machine learning approaches for EGFR mutation status prediction in NSCLC: an updated systematic review.用于非小细胞肺癌中表皮生长因子受体突变状态预测的机器学习方法：一项更新的系统评价

Front Oncol. 2025 Jul 10;15:1576461. doi: 10.3389/fonc.2025.1576461. eCollection 2025.

Reshaping transplantation with AI, emerging technologies and xenotransplantation.利用人工智能、新兴技术和异种移植重塑移植领域。

Nat Med. 2025 Jul 14. doi: 10.1038/s41591-025-03801-9.

Benchmarking vision-language models for diagnostics in emergency and critical care settings.用于急诊和重症监护环境诊断的视觉语言模型基准测试。

NPJ Digit Med. 2025 Jul 10;8(1):423. doi: 10.1038/s41746-025-01837-2.

本文引用的文献

Building machines that learn and think with people.与人类一起学习和思考的机器。

Nat Hum Behav. 2024 Oct;8(10):1851-1863. doi: 10.1038/s41562-024-01991-9. Epub 2024 Oct 22.

Vision-language foundation model for echocardiogram interpretation.用于超声心动图解释的视觉-语言基础模型。

Nat Med. 2024 May;30(5):1481-1488. doi: 10.1038/s41591-024-02959-y. Epub 2024 Apr 30.

A visual-language foundation model for computational pathology.用于计算病理学的视觉-语言基础模型。

Nat Med. 2024 Mar;30(3):863-874. doi: 10.1038/s41591-024-02856-4. Epub 2024 Mar 19.

A visual-language foundation model for pathology image analysis using medical Twitter.一种使用医学推特进行病理学图像分析的视觉语言基础模型。

Nat Med. 2023 Sep;29(9):2307-2316. doi: 10.1038/s41591-023-02504-3. Epub 2023 Aug 17.

Knowledge-enhanced visual-language pre-training on chest radiology images.基于胸部放射影像的知识增强视觉语言预训练。

Nat Commun. 2023 Jul 28;14(1):4542. doi: 10.1038/s41467-023-40260-7.

Large language models in medicine.医学中的大型语言模型。

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Three Challenges for AI-Assisted Decision-Making.人工智能辅助决策面临的三大挑战

Perspect Psychol Sci. 2024 Sep;19(5):722-734. doi: 10.1177/17456916231181102. Epub 2023 Jul 13.

Large language models encode clinical knowledge.大语言模型编码临床知识。

Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

Health system-scale language models are all-purpose prediction engines.健康系统规模的语言模型是通用的预测引擎。

Nature. 2023 Jul;619(7969):357-362. doi: 10.1038/s41586-023-06160-y. Epub 2023 Jun 7.

Foundation models for generalist medical artificial intelligence.通用型医学人工智能的基础模型。

Nature. 2023 Apr;616(7956):259-265. doi: 10.1038/s41586-023-05881-4. Epub 2023 Apr 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于疾病诊断辅助的通用医学语言模型。

A generalist medical language model for disease diagnosis assistance.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献