用于精准肿瘤学的视觉语言基础模型。

A vision-language foundation model for precision oncology.

作者信息

Xiang Jinxi, Wang Xiyue, Zhang Xiaoming, Xi Yinghua, Eweje Feyisope, Chen Yijiang, Li Yuchen, Bergstrom Colin, Gopaulchan Matthew, Kim Ted, Yu Kun-Hsing, Willens Sierra, Olguin Francesca Maria, Nirschl Jeffrey J, Neal Joel, Diehn Maximilian, Yang Sen, Li Ruijiang

机构信息

Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA.

Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA.

出版信息

Nature. 2025 Feb;638(8051):769-778. doi: 10.1038/s41586-024-08378-w. Epub 2025 Jan 8.

DOI:10.1038/s41586-024-08378-w

PMID:39779851

Abstract

Clinical decision-making is driven by multimodal data, including clinical notes and pathological characteristics. Artificial intelligence approaches that can effectively integrate multimodal data hold significant promise in advancing clinical care. However, the scarcity of well-annotated multimodal datasets in clinical settings has hindered the development of useful models. In this study, we developed the Multimodal transformer with Unified maSKed modeling (MUSK), a vision-language foundation model designed to leverage large-scale, unlabelled, unpaired image and text data. MUSK was pretrained on 50 million pathology images from 11,577 patients and one billion pathology-related text tokens using unified masked modelling. It was further pretrained on one million pathology image-text pairs to efficiently align the vision and language features. With minimal or no further training, MUSK was tested in a wide range of applications and demonstrated superior performance across 23 patch-level and slide-level benchmarks, including image-to-text and text-to-image retrieval, visual question answering, image classification and molecular biomarker prediction. Furthermore, MUSK showed strong performance in outcome prediction, including melanoma relapse prediction, pan-cancer prognosis prediction and immunotherapy response prediction in lung and gastro-oesophageal cancers. MUSK effectively combined complementary information from pathology images and clinical reports and could potentially improve diagnosis and precision in cancer therapy.

摘要

临床决策由多模态数据驱动，包括临床记录和病理特征。能够有效整合多模态数据的人工智能方法在推进临床护理方面具有重大前景。然而，临床环境中注释良好的多模态数据集的稀缺阻碍了有用模型的开发。在本研究中，我们开发了具有统一掩码建模的多模态变换器（MUSK），这是一种旨在利用大规模、未标记、未配对的图像和文本数据的视觉语言基础模型。MUSK使用统一掩码建模在来自11577名患者的5000万张病理图像和10亿个与病理相关的文本标记上进行了预训练。它在100万个病理图像-文本对上进一步预训练，以有效地对齐视觉和语言特征。经过最少或无需进一步训练，MUSK在广泛的应用中进行了测试，并在23个斑块级和玻片级基准测试中表现出色，包括图像到文本和文本到图像检索、视觉问答、图像分类和分子生物标志物预测。此外，MUSK在结果预测方面表现出色，包括黑色素瘤复发预测、泛癌预后预测以及肺癌和胃食管癌的免疫治疗反应预测。MUSK有效地结合了病理图像和临床报告中的互补信息，并有可能提高癌症治疗的诊断和精准度。

相似文献

A vision-language foundation model for precision oncology.

Nature. 2025 Feb;638(8051):769-778. doi: 10.1038/s41586-024-08378-w. Epub 2025 Jan 8.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Artificial intelligence for diagnosing exudative age-related macular degeneration.

Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Artificial intelligence entering the pathology arena in oncology: current applications and future perspectives.

Ann Oncol. 2025 Apr 28. doi: 10.1016/j.annonc.2025.03.006.

Performance of Multimodal Artificial Intelligence Chatbots Evaluated on Clinical Oncology Cases.

JAMA Netw Open. 2024 Oct 1;7(10):e2437711. doi: 10.1001/jamanetworkopen.2024.37711.

Pharmacological treatment of children with gastro-oesophageal reflux.

Cochrane Database Syst Rev. 2014 Nov 24;2014(11):CD008550. doi: 10.1002/14651858.CD008550.pub2.

Multi-resolution vision transformer model for histopathological skin cancer subtype classification using whole slide images.

Comput Biol Med. 2025 Sep;196(Pt A):110724. doi: 10.1016/j.compbiomed.2025.110724. Epub 2025 Jul 9.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Artificial intelligence for detecting keratoconus.

Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.

引用本文的文献

Computational pathology annotation enhances the resolution and interpretation of breast cancer spatial transcriptomics data.

NPJ Precis Oncol. 2025 Sep 9;9(1):310. doi: 10.1038/s41698-025-01104-3.

Multimodal integration strategies for clinical application in oncology.

Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.

Deep-learning triage of 3D pathology datasets for comprehensive and efficient pathologist assessments.

bioRxiv. 2025 Jul 22:2025.07.20.665804. doi: 10.1101/2025.07.20.665804.

Artificial intelligence-driven pathomics in hepatocellular carcinoma: current developments, challenges and perspectives.

Discov Oncol. 2025 Jul 28;16(1):1424. doi: 10.1007/s12672-025-03254-z.

Machine learning approaches for EGFR mutation status prediction in NSCLC: an updated systematic review.

Front Oncol. 2025 Jul 10;15:1576461. doi: 10.3389/fonc.2025.1576461. eCollection 2025.

Spatial multi-omics and deep learning reveal fingerprints of immunotherapy response and resistance in hepatocellular carcinoma.

bioRxiv. 2025 Jun 12:2025.06.11.656869. doi: 10.1101/2025.06.11.656869.

AI-enabled molecular phenotyping and prognostic predictions in lung cancer through multimodal clinical information integration.

Cell Rep Med. 2025 Jul 15;6(7):102216. doi: 10.1016/j.xcrm.2025.102216. Epub 2025 Jul 2.

Foundation models and intelligent decision-making: Progress, challenges, and perspectives.

Innovation (Camb). 2025 May 12;6(6):100948. doi: 10.1016/j.xinn.2025.100948. eCollection 2025 Jun 2.

PixCell: A generative foundation model for digital histopathology images.

ArXiv. 2025 Jun 5:arXiv:2506.05127v1.

Large Language Models in Cancer Imaging: Applications and Future Perspectives.

J Clin Med. 2025 May 8;14(10):3285. doi: 10.3390/jcm14103285.

本文引用的文献

A pathology foundation model for cancer diagnosis and prognosis prediction.

Nature. 2024 Oct;634(8035):970-978. doi: 10.1038/s41586-024-07894-z. Epub 2024 Sep 4.

A foundation model for clinical-grade computational pathology and rare cancers detection.

Nat Med. 2024 Oct;30(10):2924-2935. doi: 10.1038/s41591-024-03141-0. Epub 2024 Jul 22.

A multimodal generative AI copilot for human pathology.

Nature. 2024 Oct;634(8033):466-473. doi: 10.1038/s41586-024-07618-3. Epub 2024 Jun 12.

A whole-slide foundation model for digital pathology from real-world data.

Nature. 2024 Jun;630(8015):181-188. doi: 10.1038/s41586-024-07441-w. Epub 2024 May 22.

Quilt-1M: One Million Image-Text Pairs for Histopathology.

Adv Neural Inf Process Syst. 2023 Dec;36(DB1):37995-38017.

Vision-language foundation model for echocardiogram interpretation.

Nat Med. 2024 May;30(5):1481-1488. doi: 10.1038/s41591-024-02959-y. Epub 2024 Apr 30.

Transparent medical image AI via an image-text foundation model grounded in medical literature.

Nat Med. 2024 Apr;30(4):1154-1165. doi: 10.1038/s41591-024-02887-x. Epub 2024 Apr 16.

Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.

CA Cancer J Clin. 2024 May-Jun;74(3):229-263. doi: 10.3322/caac.21834. Epub 2024 Apr 4.

Towards a general-purpose foundation model for computational pathology.

Nat Med. 2024 Mar;30(3):850-862. doi: 10.1038/s41591-024-02857-3. Epub 2024 Mar 19.

A visual-language foundation model for computational pathology.

Nat Med. 2024 Mar;30(3):863-874. doi: 10.1038/s41591-024-02856-4. Epub 2024 Mar 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于精准肿瘤学的视觉语言基础模型。

A vision-language foundation model for precision oncology.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献