迈向医学多语言语言模型的构建。

Towards building multilingual language model for medicine.

机构信息

Shanghai Jiao Tong University, Shanghai, China.

Shanghai AI Laboratory, Shanghai, China.

出版信息

Nat Commun. 2024 Sep 27;15(1):8384. doi: 10.1038/s41467-024-52417-z.

DOI:10.1038/s41467-024-52417-z

PMID:39333468

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11436924/

Abstract

The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions. To promote this domain, we present contributions from the following: First, we construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages, termed as MMedC, enabling auto-regressive domain adaptation for general LLMs; Second, to monitor the development of multilingual medical LLMs, we propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench; Third, we have assessed a number of open-source large language models (LLMs) on our benchmark, along with those further auto-regressive trained on MMedC. Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks, even rivaling GPT-4. In conclusion, in this work, We present a large-scale corpus, a benchmark and a series of models to support the development of multilingual medical LLMs.

摘要

开源、多语言医学语言模型的发展可以使来自不同地区的广泛的、语言多样化的受众受益。为了促进这一领域的发展，我们提出了以下贡献：首先，我们构建了一个多语言医学语料库，包含大约 255 亿个包含 6 种主要语言的令牌，称为 MMedC，能够实现通用大语言模型的自回归领域自适应；其次，为了监测多语言医学大语言模型的发展，我们提出了一个带有推理的多语言医学多项选择问答基准，称为 MMedBench；第三，我们在基准上评估了一些开源的大语言模型（LLMs），以及那些在 MMedC 上进一步自回归训练的模型。我们的最终模型 MMed-Llama 3 只有 80 亿个参数，在 MMedBench 和英语基准上的表现都优于所有其他开源模型，甚至可以与 GPT-4 相媲美。总之，在这项工作中，我们提出了一个大规模语料库、一个基准和一系列模型，以支持多语言医学大语言模型的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5375/11436924/4d6289a09496/41467_2024_52417_Fig1_HTML.jpg

相似文献

Towards building multilingual language model for medicine.

Nat Commun. 2024 Sep 27;15(1):8384. doi: 10.1038/s41467-024-52417-z.

MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering.

Artif Intell Med. 2024 Sep;155:102938. doi: 10.1016/j.artmed.2024.102938. Epub 2024 Jul 31.

PMC-LLaMA: toward building open-source language models for medicine.

J Am Med Inform Assoc. 2024 Sep 1;31(9):1833-1843. doi: 10.1093/jamia/ocae045.

PH-LLM: Public Health Large Language Models for Infoveillance.

medRxiv. 2025 Feb 10:2025.02.08.25321587. doi: 10.1101/2025.02.08.25321587.

Benchmarking Hook and Bait Urdu news dataset for domain-agnostic and multilingual fake news detection using large language models.

Sci Rep. 2025 May 3;15(1):15553. doi: 10.1038/s41598-025-98271-x.

Benchmarking large language models for biomedical natural language processing applications and recommendations.

Nat Commun. 2025 Apr 6;16(1):3280. doi: 10.1038/s41467-025-56989-2.

Speech translation for multilingual medical education leveraged by large language models.

Artif Intell Med. 2025 Aug;166:103147. doi: 10.1016/j.artmed.2025.103147. Epub 2025 May 14.

Multilingual feasibility of GPT-4o for automated Voice-to-Text CT and MRI report transcription.

Eur J Radiol. 2025 Jan;182:111827. doi: 10.1016/j.ejrad.2024.111827. Epub 2024 Nov 17.

A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

GPT is an effective tool for multilingual psychological text analysis.

Proc Natl Acad Sci U S A. 2024 Aug 20;121(34):e2308950121. doi: 10.1073/pnas.2308950121. Epub 2024 Aug 12.

引用本文的文献

From large language models to multimodal AI: a scoping review on the potential of generative AI in medicine.

Biomed Eng Lett. 2025 Aug 22;15(5):845-863. doi: 10.1007/s13534-025-00497-1. eCollection 2025 Sep.

Two stage large language model approach enhancing entity classification and relationship mapping in radiology reports.

Sci Rep. 2025 Aug 27;15(1):31550. doi: 10.1038/s41598-025-16213-z.

Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations.

NPJ Digit Med. 2025 Jul 21;8(1):466. doi: 10.1038/s41746-025-01845-2.

Performance of large language models in the differential diagnosis of benign and malignant biliary stricture.

Front Oncol. 2025 Jul 3;15:1613818. doi: 10.3389/fonc.2025.1613818. eCollection 2025.

Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.

J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.

Enhancing the Accuracy of Human Phenotype Ontology Identification: Comparative Evaluation of Multimodal Large Language Models.

J Med Internet Res. 2025 Jun 2;27:e73233. doi: 10.2196/73233.

Application of AI Chatbot in Responding to Asynchronous Text-Based Messages From Patients With Cancer: Comparative Study.

J Med Internet Res. 2025 May 21;27:e67462. doi: 10.2196/67462.

Natural Language Processing for Digital Health in the Era of Large Language Models.

Yearb Med Inform. 2024 Aug;33(1):229-240. doi: 10.1055/s-0044-1800750. Epub 2025 Apr 8.

Benchmarking of Large Language Models for the Dental Admission Test.

Health Data Sci. 2025 Apr 1;5:0250. doi: 10.34133/hds.0250. eCollection 2025.

A two-step concept-based approach for enhanced interpretability and trust in skin lesion diagnosis.

Comput Struct Biotechnol J. 2025 Feb 20;28:71-79. doi: 10.1016/j.csbj.2025.02.013. eCollection 2025.

本文引用的文献

PMC-LLaMA: toward building open-source language models for medicine.

J Am Med Inform Assoc. 2024 Sep 1;31(9):1833-1843. doi: 10.1093/jamia/ocae045.

Almanac - Retrieval-Augmented Language Models for Clinical Medicine.

NEJM AI. 2024 Feb;1(2). doi: 10.1056/aioa2300068. Epub 2024 Jan 25.

Large language models propagate race-based medicine.

NPJ Digit Med. 2023 Oct 20;6(1):195. doi: 10.1038/s41746-023-00939-z.

ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.

Cureus. 2023 Jun 24;15(6):e40895. doi: 10.7759/cureus.40895. eCollection 2023 Jun.

Large language models encode clinical knowledge.

Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

Foundation models for generalist medical artificial intelligence.

Nature. 2023 Apr;616(7956):259-265. doi: 10.1038/s41586-023-05881-4. Epub 2023 Apr 12.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Explainable artificial intelligence for mental health through transparency and interpretability for understandability.

NPJ Digit Med. 2023 Jan 18;6(1):6. doi: 10.1038/s41746-023-00751-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

迈向医学多语言语言模型的构建。

Towards building multilingual language model for medicine.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献