• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MedCLIP:从未配对医学图像和文本中进行对比学习。

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.

作者信息

Wang Zifeng, Wu Zhenbang, Agarwal Dinesh, Sun Jimeng

机构信息

Department of Computer Science, University of Illinois Urbana-Champaign.

Adobe.

出版信息

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:3876-3887. doi: 10.18653/v1/2022.emnlp-main.256.

DOI:10.18653/v1/2022.emnlp-main.256
PMID:39144675
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11323634/
Abstract

Existing vision-text contrastive learning like CLIP (Radford et al., 2021) aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., images and reports from separate patients probably carry the same semantics but are wrongly treated as negatives. In this paper, we decouple images and texts for multimodal contrastive learning thus scaling the usable training data in a combinatorial magnitude with low cost. We also propose to replace the InfoNCE loss with semantic matching loss based on medical knowledge to eliminate false negatives in contrastive learning. We prove that MedCLIP is a simple yet effective framework: it outperforms state-of-the-art methods on zero-shot prediction, supervised classification, and image-text retrieval. Surprisingly, we observe that with only 20K pre-training data, MedCLIP wins over the state-of-the-art method (using ≈200K data).

摘要

现有的视觉-文本对比学习方法,如CLIP(拉德福德等人,2021年),旨在匹配配对的图像和标题嵌入,同时将其他嵌入分开,这提高了表示的可迁移性并支持零样本预测。然而,医学图像-文本数据集比来自互联网的一般图像和标题少几个数量级。此外,以前的方法会遇到许多假阴性情况,即来自不同患者的图像和报告可能具有相同的语义,但却被错误地视为阴性。在本文中,我们将图像和文本解耦以进行多模态对比学习,从而以低成本在组合量级上扩展可用训练数据。我们还建议用基于医学知识的语义匹配损失取代InfoNCE损失,以消除对比学习中的假阴性。我们证明MedCLIP是一个简单而有效的框架:它在零样本预测、监督分类和图像-文本检索方面优于现有方法。令人惊讶的是,我们观察到,仅使用20K预训练数据,MedCLIP就超过了现有方法(使用约200K数据)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/0d0060c6eb3d/nihms-2012083-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/43ad7cfc4068/nihms-2012083-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/3540b5594bfb/nihms-2012083-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/92be6dbff8eb/nihms-2012083-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/ce347e62f409/nihms-2012083-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/0d0060c6eb3d/nihms-2012083-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/43ad7cfc4068/nihms-2012083-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/3540b5594bfb/nihms-2012083-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/92be6dbff8eb/nihms-2012083-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/ce347e62f409/nihms-2012083-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d2f/11323634/0d0060c6eb3d/nihms-2012083-f0004.jpg

相似文献

1
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.MedCLIP:从未配对医学图像和文本中进行对比学习。
Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:3876-3887. doi: 10.18653/v1/2022.emnlp-main.256.
2
sCL-ST: Supervised Contrastive Learning With Semantic Transformations for Multiple Lead ECG Arrhythmia Classification.sCL-ST:基于语义转换的监督对比学习在多导联 ECG 心律失常分类中的应用。
IEEE J Biomed Health Inform. 2023 Jun;27(6):2818-2828. doi: 10.1109/JBHI.2023.3246241. Epub 2023 Jun 5.
3
Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.基于伪标签自训练的局部对比损失的半监督医学图像分割。
Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.
4
Unpaired Image-Text Matching via Multimodal Aligned Conceptual Knowledge.通过多模态对齐概念知识实现非配对图像-文本匹配
IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5160-5176. doi: 10.1109/TPAMI.2024.3432552.
5
Improving Medical Vision-Language Contrastive Pretraining With Semantics-Aware Triage.利用语义感知分诊改进医学视觉-语言对比预训练
IEEE Trans Med Imaging. 2023 Dec;42(12):3579-3589. doi: 10.1109/TMI.2023.3294980. Epub 2023 Nov 30.
6
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders.通过微调预训练的图像-文本编码器,显著提高零样本 X 射线病理学分类。
Sci Rep. 2024 Oct 5;14(1):23199. doi: 10.1038/s41598-024-73695-z.
7
Semi-Supervised Pixel Contrastive Learning Framework for Tissue Segmentation in Histopathological Image.用于组织病理学图像中组织分割的半监督像素对比学习框架
IEEE J Biomed Health Inform. 2023 Jan;27(1):97-108. doi: 10.1109/JBHI.2022.3216293. Epub 2023 Jan 4.
8
Word self-update contrastive adversarial networks for text-to-image synthesis.基于词自更新对比对抗网络的文本到图像合成。
Neural Netw. 2023 Oct;167:433-444. doi: 10.1016/j.neunet.2023.08.038. Epub 2023 Aug 25.
9
SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records.SCEHR:使用电子健康记录进行临床风险预测的监督对比学习
Proc IEEE Int Conf Data Min. 2021 Dec;2021:857-866. doi: 10.1109/icdm51629.2021.00097.
10
ProtoCLIP: Prototypical Contrastive Language Image Pretraining.ProtoCLIP:原型对比语言图像预训练
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):610-624. doi: 10.1109/TNNLS.2023.3335859. Epub 2025 Jan 7.

引用本文的文献

1
Vision-language foundation models for medical imaging: a review of current practices and innovations.用于医学成像的视觉语言基础模型:当前实践与创新综述
Biomed Eng Lett. 2025 Jun 6;15(5):809-830. doi: 10.1007/s13534-025-00484-6. eCollection 2025 Sep.
2
A perspective for adapting generalist AI to specialized medical AI applications and their challenges.将通用人工智能应用于专业医学人工智能应用的前景及其挑战。
NPJ Digit Med. 2025 Jul 11;8(1):429. doi: 10.1038/s41746-025-01789-7.
3
A Pan-Organ Vision-Language Model for Generalizable 3D CT Representations.

本文引用的文献

1
PiCO+: Contrastive Label Disambiguation for Robust Partial Label Learning.PiCO+:用于稳健部分标签学习的对比标签消歧
IEEE Trans Pattern Anal Mach Intell. 2024 May;46(5):3183-3198. doi: 10.1109/TPAMI.2023.3342650. Epub 2024 Apr 3.
2
Deep learning for chest X-ray analysis: A survey.深度学习在胸部 X 光分析中的应用:综述。
Med Image Anal. 2021 Aug;72:102125. doi: 10.1016/j.media.2021.102125. Epub 2021 Jun 5.
3
Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia.
用于可泛化3D CT表征的全器官视觉语言模型。
medRxiv. 2025 Jul 3:2025.07.03.25330654. doi: 10.1101/2025.07.03.25330654.
4
A narrative review of foundation models for medical image segmentation: zero-shot performance evaluation on diverse modalities.医学图像分割基础模型的叙述性综述:不同模态下的零样本性能评估
Quant Imaging Med Surg. 2025 Jun 6;15(6):5825-5858. doi: 10.21037/qims-2024-2826. Epub 2025 Jun 3.
5
A scoping review of self-supervised representation learning for clinical decision making using EHR categorical data.一项使用电子健康记录分类数据进行临床决策的自监督表征学习的范围综述。
NPJ Digit Med. 2025 Jun 14;8(1):362. doi: 10.1038/s41746-025-01692-1.
6
Rethinking VLMs and LLMs for image classification.重新思考用于图像分类的视觉语言模型和语言模型。
Sci Rep. 2025 Jun 4;15(1):19692. doi: 10.1038/s41598-025-04384-8.
7
Minimum levels of interpretability for artificial moral agents.人工道德主体的最低可解释性水平。
AI Ethics. 2025;5(3):2071-2087. doi: 10.1007/s43681-024-00536-0. Epub 2024 Jul 31.
8
A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis.领域转移的教科书式解决方案:医学图像分析的知识先验
Adv Neural Inf Process Syst. 2024;37:90683-90713.
9
Deep learning based dual stage model for accurate nasogastric tube positioning in chest radiographs.基于深度学习的双阶段模型用于胸部X光片中鼻胃管的精确定位
Sci Rep. 2025 Apr 25;15(1):14556. doi: 10.1038/s41598-025-98562-3.
10
Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources.多元中的统一:跨多模态医学资源的协作式预训练
Proc Conf Assoc Comput Linguist Meet. 2024 Aug;2024(Volume 1 Long Papers):3644-3656. doi: 10.18653/v1/2024.acl-long.199.
利用可能患有肺炎的专家注释扩充美国国立卫生研究院胸部X光数据集。
Radiol Artif Intell. 2019 Jan 30;1(1):e180041. doi: 10.1148/ryai.2019180041. eCollection 2019 Jan.
4
Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images.探讨使用胸部 X 光图像的图像增强技术对 COVID-19 检测的影响。
Comput Biol Med. 2021 May;132:104319. doi: 10.1016/j.compbiomed.2021.104319. Epub 2021 Mar 11.
5
A Characteristic Chest Radiographic Pattern in the Setting of the COVID-19 Pandemic.COVID-19大流行背景下的一种特征性胸部X线表现模式。
Radiol Cardiothorac Imaging. 2020 Sep 3;2(5):e200280. doi: 10.1148/ryct.2020200280. eCollection 2020 Oct.
6
MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.MIMIC-CXR,一个去标识化的、公开可用的、包含自由文本报告的胸部 X 光数据库。
Sci Data. 2019 Dec 12;6(1):317. doi: 10.1038/s41597-019-0322-0.
7
NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.NegBio:一种用于放射学报告中否定和不确定性检测的高性能工具。
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:188-196. eCollection 2018.
8
An overview of MetaMap: historical perspective and recent advances.MetaMap 概述:历史视角与最新进展。
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.
9
The Unified Medical Language System (UMLS): integrating biomedical terminology.统一医学语言系统(UMLS):整合生物医学术语。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.