MedCLIP：从未配对医学图像和文本中进行对比学习。

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.

作者信息

Wang Zifeng, Wu Zhenbang, Agarwal Dinesh, Sun Jimeng

机构信息

Department of Computer Science, University of Illinois Urbana-Champaign.

Adobe.

出版信息

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:3876-3887. doi: 10.18653/v1/2022.emnlp-main.256.

DOI:10.18653/v1/2022.emnlp-main.256

PMID:39144675

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11323634/

Abstract

Existing vision-text contrastive learning like CLIP (Radford et al., 2021) aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., images and reports from separate patients probably carry the same semantics but are wrongly treated as negatives. In this paper, we decouple images and texts for multimodal contrastive learning thus scaling the usable training data in a combinatorial magnitude with low cost. We also propose to replace the InfoNCE loss with semantic matching loss based on medical knowledge to eliminate false negatives in contrastive learning. We prove that MedCLIP is a simple yet effective framework: it outperforms state-of-the-art methods on zero-shot prediction, supervised classification, and image-text retrieval. Surprisingly, we observe that with only 20K pre-training data, MedCLIP wins over the state-of-the-art method (using ≈200K data).

摘要

现有的视觉-文本对比学习方法，如CLIP（拉德福德等人，2021年），旨在匹配配对的图像和标题嵌入，同时将其他嵌入分开，这提高了表示的可迁移性并支持零样本预测。然而，医学图像-文本数据集比来自互联网的一般图像和标题少几个数量级。此外，以前的方法会遇到许多假阴性情况，即来自不同患者的图像和报告可能具有相同的语义，但却被错误地视为阴性。在本文中，我们将图像和文本解耦以进行多模态对比学习，从而以低成本在组合量级上扩展可用训练数据。我们还建议用基于医学知识的语义匹配损失取代InfoNCE损失，以消除对比学习中的假阴性。我们证明MedCLIP是一个简单而有效的框架：它在零样本预测、监督分类和图像-文本检索方面优于现有方法。令人惊讶的是，我们观察到，仅使用20K预训练数据，MedCLIP就超过了现有方法（使用约200K数据）。

相似文献

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:3876-3887. doi: 10.18653/v1/2022.emnlp-main.256.

sCL-ST: Supervised Contrastive Learning With Semantic Transformations for Multiple Lead ECG Arrhythmia Classification.

IEEE J Biomed Health Inform. 2023 Jun;27(6):2818-2828. doi: 10.1109/JBHI.2023.3246241. Epub 2023 Jun 5.

Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.

Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.

Unpaired Image-Text Matching via Multimodal Aligned Conceptual Knowledge.

IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5160-5176. doi: 10.1109/TPAMI.2024.3432552.

Improving Medical Vision-Language Contrastive Pretraining With Semantics-Aware Triage.

IEEE Trans Med Imaging. 2023 Dec;42(12):3579-3589. doi: 10.1109/TMI.2023.3294980. Epub 2023 Nov 30.

Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders.

Sci Rep. 2024 Oct 5;14(1):23199. doi: 10.1038/s41598-024-73695-z.

Semi-Supervised Pixel Contrastive Learning Framework for Tissue Segmentation in Histopathological Image.

IEEE J Biomed Health Inform. 2023 Jan;27(1):97-108. doi: 10.1109/JBHI.2022.3216293. Epub 2023 Jan 4.

Word self-update contrastive adversarial networks for text-to-image synthesis.

Neural Netw. 2023 Oct;167:433-444. doi: 10.1016/j.neunet.2023.08.038. Epub 2023 Aug 25.

SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records.

Proc IEEE Int Conf Data Min. 2021 Dec;2021:857-866. doi: 10.1109/icdm51629.2021.00097.

ProtoCLIP: Prototypical Contrastive Language Image Pretraining.

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):610-624. doi: 10.1109/TNNLS.2023.3335859. Epub 2025 Jan 7.

引用本文的文献

Vision-language foundation models for medical imaging: a review of current practices and innovations.

Biomed Eng Lett. 2025 Jun 6;15(5):809-830. doi: 10.1007/s13534-025-00484-6. eCollection 2025 Sep.

A perspective for adapting generalist AI to specialized medical AI applications and their challenges.

NPJ Digit Med. 2025 Jul 11;8(1):429. doi: 10.1038/s41746-025-01789-7.

A Pan-Organ Vision-Language Model for Generalizable 3D CT Representations.

medRxiv. 2025 Jul 3:2025.07.03.25330654. doi: 10.1101/2025.07.03.25330654.

A narrative review of foundation models for medical image segmentation: zero-shot performance evaluation on diverse modalities.

Quant Imaging Med Surg. 2025 Jun 6;15(6):5825-5858. doi: 10.21037/qims-2024-2826. Epub 2025 Jun 3.

A scoping review of self-supervised representation learning for clinical decision making using EHR categorical data.

NPJ Digit Med. 2025 Jun 14;8(1):362. doi: 10.1038/s41746-025-01692-1.

Rethinking VLMs and LLMs for image classification.

Sci Rep. 2025 Jun 4;15(1):19692. doi: 10.1038/s41598-025-04384-8.

Minimum levels of interpretability for artificial moral agents.

AI Ethics. 2025;5(3):2071-2087. doi: 10.1007/s43681-024-00536-0. Epub 2024 Jul 31.

A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis.

Adv Neural Inf Process Syst. 2024;37:90683-90713.

Deep learning based dual stage model for accurate nasogastric tube positioning in chest radiographs.

Sci Rep. 2025 Apr 25;15(1):14556. doi: 10.1038/s41598-025-98562-3.

Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources.

Proc Conf Assoc Comput Linguist Meet. 2024 Aug;2024(Volume 1 Long Papers):3644-3656. doi: 10.18653/v1/2024.acl-long.199.

本文引用的文献

PiCO+: Contrastive Label Disambiguation for Robust Partial Label Learning.

IEEE Trans Pattern Anal Mach Intell. 2024 May;46(5):3183-3198. doi: 10.1109/TPAMI.2023.3342650. Epub 2024 Apr 3.

Deep learning for chest X-ray analysis: A survey.

Med Image Anal. 2021 Aug;72:102125. doi: 10.1016/j.media.2021.102125. Epub 2021 Jun 5.

Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia.

Radiol Artif Intell. 2019 Jan 30;1(1):e180041. doi: 10.1148/ryai.2019180041. eCollection 2019 Jan.

Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images.

Comput Biol Med. 2021 May;132:104319. doi: 10.1016/j.compbiomed.2021.104319. Epub 2021 Mar 11.

A Characteristic Chest Radiographic Pattern in the Setting of the COVID-19 Pandemic.

Radiol Cardiothorac Imaging. 2020 Sep 3;2(5):e200280. doi: 10.1148/ryct.2020200280. eCollection 2020 Oct.

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.

Sci Data. 2019 Dec 12;6(1):317. doi: 10.1038/s41597-019-0322-0.

NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:188-196. eCollection 2018.

An overview of MetaMap: historical perspective and recent advances.

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

The Unified Medical Language System (UMLS): integrating biomedical terminology.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MedCLIP：从未配对医学图像和文本中进行对比学习。

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.

作者信息

Wang Zifeng, Wu Zhenbang, Agarwal Dinesh, Sun Jimeng

机构信息

Department of Computer Science, University of Illinois Urbana-Champaign.

Adobe.

出版信息

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:3876-3887. doi: 10.18653/v1/2022.emnlp-main.256.

DOI:10.18653/v1/2022.emnlp-main.256

PMID:39144675

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11323634/

Abstract

摘要

MedCLIP：从未配对医学图像和文本中进行对比学习。

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

MedCLIP：从未配对医学图像和文本中进行对比学习。

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.

作者信息

机构信息

出版信息