一种用于生成逼真胸部X光图像的视觉语言基础模型。

A vision-language foundation model for the generation of realistic chest X-ray images.

作者信息

Bluethgen Christian, Chambon Pierre, Delbrouck Jean-Benoit, van der Sluijs Rogier, Połacin Małgorzata, Zambrano Chaves Juan Manuel, Abraham Tanishq Mathew, Purohit Shivanshu, Langlotz Curtis P, Chaudhari Akshay S

机构信息

Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA.

Department of Radiology, Stanford University, Palo Alto, CA, USA.

出版信息

Nat Biomed Eng. 2025 Apr;9(4):494-506. doi: 10.1038/s41551-024-01246-y. Epub 2024 Aug 26.

DOI:10.1038/s41551-024-01246-y

PMID:39187663

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11861387/

Abstract

The paucity of high-quality medical imaging datasets could be mitigated by machine learning models that generate compositionally diverse images that faithfully represent medical concepts and pathologies. However, large vision-language models are trained on natural images, and the diversity distribution of the generated images substantially differs from that of medical images. Moreover, medical language involves specific and semantically rich vocabulary. Here we describe a domain-adaptation strategy for large vision-language models that overcomes distributional shifts. Specifically, by leveraging publicly available datasets of chest X-ray images and the corresponding radiology reports, we adapted a latent diffusion model pre-trained on pairs of natural images and text descriptors to generate diverse and visually plausible synthetic chest X-ray images (as confirmed by board-certified radiologists) whose appearance can be controlled with free-form medical text prompts. The domain-adaptation strategy for the text-conditioned synthesis of medical images can be used to augment training datasets and is a viable alternative to the sharing of real medical images for model training and fine-tuning.

摘要

高质量医学影像数据集的匮乏可以通过机器学习模型来缓解，这些模型能生成在成分上具有多样性且能如实呈现医学概念和病症的图像。然而，大型视觉语言模型是在自然图像上训练的，生成图像的多样性分布与医学图像的差异很大。此外，医学语言涉及特定且语义丰富的词汇。在此，我们描述一种针对大型视觉语言模型的领域适应策略，该策略可克服分布偏移。具体而言，通过利用公开可用的胸部X光图像数据集及相应的放射学报告，我们对一个在自然图像与文本描述符对上进行预训练的潜在扩散模型进行了调整，以生成多样且视觉上合理的合成胸部X光图像（经专业放射科医生确认），其外观可通过自由形式的医学文本提示进行控制。用于医学图像文本条件合成的领域适应策略可用于扩充训练数据集，并且是在模型训练和微调中共享真实医学图像的可行替代方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33dc/11861387/8e5058378062/nihms-2023706-f0005.jpg

相似文献

A vision-language foundation model for the generation of realistic chest X-ray images.一种用于生成逼真胸部X光图像的视觉语言基础模型。

Nat Biomed Eng. 2025 Apr;9(4):494-506. doi: 10.1038/s41551-024-01246-y. Epub 2024 Aug 26.

Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders.通过微调预训练的图像-文本编码器，显著提高零样本 X 射线病理学分类。

Sci Rep. 2024 Oct 5;14(1):23199. doi: 10.1038/s41598-024-73695-z.

Study-level cross-modal retrieval of chest x-ray images and reports with adapter-based fine-tuning.基于适配器微调的胸部X光图像与报告的研究级跨模态检索

Phys Med Biol. 2025 Feb 13;70(4). doi: 10.1088/1361-6560/adaf05.

Knowledge-enhanced visual-language pre-training on chest radiology images.基于胸部放射影像的知识增强视觉语言预训练。

Nat Commun. 2023 Jul 28;14(1):4542. doi: 10.1038/s41467-023-40260-7.

Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering.用于医学视觉问答的多目标跨模态自监督视觉语言预训练

J Biomed Inform. 2024 Dec;160:104748. doi: 10.1016/j.jbi.2024.104748. Epub 2024 Nov 12.

Optimizing pulmonary chest x-ray classification with stacked feature ensemble and swin transformer integration.利用堆叠特征集成和 Swin Transformer 集成优化肺部胸部 X 射线分类。

Biomed Phys Eng Express. 2024 Nov 6;11(1). doi: 10.1088/2057-1976/ad8c46.

CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images.CXR-LLaVA：一种用于解读胸部X光图像的多模态大语言模型。

Eur Radiol. 2025 Jan 15. doi: 10.1007/s00330-024-11339-6.

CovXNet: A multi-dilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable multi-receptive feature optimization.CovXNet：一种多扩张卷积神经网络，用于从胸部 X 光图像中自动检测 COVID-19 和其他肺炎，具有可转移的多感受野特征优化。

Comput Biol Med. 2020 Jul;122:103869. doi: 10.1016/j.compbiomed.2020.103869. Epub 2020 Jun 20.

BRAX, Brazilian labeled chest x-ray dataset.BRAX，巴西标注胸部 X 射线数据集。

Sci Data. 2022 Aug 10;9(1):487. doi: 10.1038/s41597-022-01608-8.

Embracing Large Natural Data: Enhancing Medical Image Analysis via Cross-Domain Fine-Tuning.拥抱大规模自然数据：通过跨域微调增强医学图像分析

IEEE J Biomed Health Inform. 2024 Aug;28(8):4512-4521. doi: 10.1109/JBHI.2023.3343518. Epub 2024 Aug 6.

引用本文的文献

Building the world's first truly global medical foundation model.构建世界首个真正的全球医学基础模型。

Nat Med. 2025 Sep 8. doi: 10.1038/s41591-025-03859-5.

Vision-language foundation models for medical imaging: a review of current practices and innovations.用于医学成像的视觉语言基础模型：当前实践与创新综述

Biomed Eng Lett. 2025 Jun 6;15(5):809-830. doi: 10.1007/s13534-025-00484-6. eCollection 2025 Sep.

From large language models to multimodal AI: a scoping review on the potential of generative AI in medicine.从大语言模型到多模态人工智能：关于生成式人工智能在医学领域潜力的范围综述

Biomed Eng Lett. 2025 Aug 22;15(5):845-863. doi: 10.1007/s13534-025-00497-1. eCollection 2025 Sep.

Octascope: A Lightweight Pre-Trained Model for Optical Coherence Tomography.Octascope：一种用于光学相干断层扫描的轻量级预训练模型。

IEEE Access. 2025;13:138005-138019. doi: 10.1109/access.2025.3595838. Epub 2025 Aug 5.

A Clinically-Informed Framework for Evaluating Vision-Language Models in Radiology Report Generation: Taxonomy of Errors and Risk-Aware Metric.一种用于评估放射学报告生成中视觉语言模型的临床信息框架：错误分类与风险感知指标

medRxiv. 2025 Jul 14:2025.07.13.25331222. doi: 10.1101/2025.07.13.25331222.

Unconditional latent diffusion models memorize patient imaging data.无条件潜在扩散模型会记住患者的影像数据。

Nat Biomed Eng. 2025 Aug 11. doi: 10.1038/s41551-025-01468-8.

Comprehensive promotion of drug traceability codes in China in 2025: challenges and solutions for tertiary outpatient pharmacists.2025年中国药品追溯码的全面推广：三级门诊药师面临的挑战与解决方案

Front Pharmacol. 2025 Jul 25;16:1619916. doi: 10.3389/fphar.2025.1619916. eCollection 2025.

A perspective for adapting generalist AI to specialized medical AI applications and their challenges.将通用人工智能应用于专业医学人工智能应用的前景及其挑战。

NPJ Digit Med. 2025 Jul 11;8(1):429. doi: 10.1038/s41746-025-01789-7.

Generation of Fundus Fluorescein Angiography Videos for Health Care Data Sharing.用于医疗保健数据共享的眼底荧光血管造影视频生成

JAMA Ophthalmol. 2025 Jun 26. doi: 10.1001/jamaophthalmol.2025.1419.

Large models in medical imaging: Advances and prospects.医学成像中的大模型：进展与展望。

Chin Med J (Engl). 2025 Jul 20;138(14):1647-1664. doi: 10.1097/CM9.0000000000003699. Epub 2025 Jun 20.

本文引用的文献

Generative models improve fairness of medical classifiers under distribution shifts.生成式模型可提高分布偏移下医学分类器的公平性。

Nat Med. 2024 Apr;30(4):1166-1173. doi: 10.1038/s41591-024-02838-6. Epub 2024 Apr 10.

A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis.基于潜在去噪扩散概率模型和生成对抗网络的医学图像合成的多模态比较。

Sci Rep. 2023 Jul 26;13(1):12098. doi: 10.1038/s41598-023-39278-0.

Self-supervised learning for medical image classification: a systematic review and implementation guidelines.用于医学图像分类的自监督学习：系统综述与实施指南

NPJ Digit Med. 2023 Apr 26;6(1):74. doi: 10.1038/s41746-023-00811-0.

Improving Data-Efficiency and Robustness of Medical Imaging Segmentation Using Inpainting-Based Self-Supervised Learning.使用基于图像修复的自监督学习提高医学图像分割的数据效率和鲁棒性

Bioengineering (Basel). 2023 Feb 4;10(2):207. doi: 10.3390/bioengineering10020207.

Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports.改进领域内的 Transformer 模型微调，用于推断多机构放射学报告中的 COVID-19 存在情况。

J Digit Imaging. 2023 Feb;36(1):164-177. doi: 10.1007/s10278-022-00714-8. Epub 2022 Nov 2.

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning.通过自监督学习对未经注释的胸部 X 光图像中的病理学进行专家级检测。

Nat Biomed Eng. 2022 Dec;6(12):1399-1406. doi: 10.1038/s41551-022-00936-9. Epub 2022 Sep 15.

Self-supervised learning in medicine and healthcare.医学和医疗保健中的自我监督学习。

Nat Biomed Eng. 2022 Dec;6(12):1346-1352. doi: 10.1038/s41551-022-00914-1. Epub 2022 Aug 11.

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations.VinDr-CXR：一个带有放射科医生标注的胸部 X 光数据集。

Sci Data. 2022 Jul 20;9(1):429. doi: 10.1038/s41597-022-01498-w.

Shifting machine learning for healthcare from development to deployment and from models to data.将医疗保健领域的机器学习从开发转移到部署，从模型转移到数据。

Nat Biomed Eng. 2022 Dec;6(12):1330-1345. doi: 10.1038/s41551-022-00898-y. Epub 2022 Jul 4.

Synthetic data in machine learning for medicine and healthcare.机器学习在医学和医疗保健领域中的合成数据。

Nat Biomed Eng. 2021 Jun;5(6):493-497. doi: 10.1038/s41551-021-00751-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验