Bluethgen Christian, Chambon Pierre, Delbrouck Jean-Benoit, van der Sluijs Rogier, Połacin Małgorzata, Zambrano Chaves Juan Manuel, Abraham Tanishq Mathew, Purohit Shivanshu, Langlotz Curtis P, Chaudhari Akshay S
Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA.
Department of Radiology, Stanford University, Palo Alto, CA, USA.
Nat Biomed Eng. 2025 Apr;9(4):494-506. doi: 10.1038/s41551-024-01246-y. Epub 2024 Aug 26.
The paucity of high-quality medical imaging datasets could be mitigated by machine learning models that generate compositionally diverse images that faithfully represent medical concepts and pathologies. However, large vision-language models are trained on natural images, and the diversity distribution of the generated images substantially differs from that of medical images. Moreover, medical language involves specific and semantically rich vocabulary. Here we describe a domain-adaptation strategy for large vision-language models that overcomes distributional shifts. Specifically, by leveraging publicly available datasets of chest X-ray images and the corresponding radiology reports, we adapted a latent diffusion model pre-trained on pairs of natural images and text descriptors to generate diverse and visually plausible synthetic chest X-ray images (as confirmed by board-certified radiologists) whose appearance can be controlled with free-form medical text prompts. The domain-adaptation strategy for the text-conditioned synthesis of medical images can be used to augment training datasets and is a viable alternative to the sharing of real medical images for model training and fine-tuning.
高质量医学影像数据集的匮乏可以通过机器学习模型来缓解,这些模型能生成在成分上具有多样性且能如实呈现医学概念和病症的图像。然而,大型视觉语言模型是在自然图像上训练的,生成图像的多样性分布与医学图像的差异很大。此外,医学语言涉及特定且语义丰富的词汇。在此,我们描述一种针对大型视觉语言模型的领域适应策略,该策略可克服分布偏移。具体而言,通过利用公开可用的胸部X光图像数据集及相应的放射学报告,我们对一个在自然图像与文本描述符对上进行预训练的潜在扩散模型进行了调整,以生成多样且视觉上合理的合成胸部X光图像(经专业放射科医生确认),其外观可通过自由形式的医学文本提示进行控制。用于医学图像文本条件合成的领域适应策略可用于扩充训练数据集,并且是在模型训练和微调中共享真实医学图像的可行替代方案。