• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大型多模态模型生成视觉增量可增强利用未标记数据的合成图像检索。

Visual delta generation with large multi-modal models enhances composed image retrieval using unlabeled data.

作者信息

Jang Young Kyun, Kim Donghyun

机构信息

Meta Platforms (United States), Menlo Park, USA.

Korea University, Seoul, Republic of Korea.

出版信息

Sci Rep. 2025 Jul 28;15(1):27463. doi: 10.1038/s41598-025-07798-6.

DOI:10.1038/s41598-025-07798-6
PMID:40721592
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12304162/
Abstract

Composed Image Retrieval (CIR) retrieves a target image similar to a reference image, guided by a provided textual modification (i.e., a triplet with<reference image, text, target image>). Previous works on CIR can largely be developed into two categories: supervised learning approaches and weakly supervised (i.e., zero-shot) learning approaches. Supervised learning CIR models require labeled triplets which may not be easily obtained and limit the widespread use of CIR and its scalability. On the other hand, a weakly supervised learning approach (also called zero-shot CIR), can be relatively easily trained with image-caption pairs without considering the image-to-image relation (i.e., no supervised triplet required), but this approach tends to yield lower accuracy. In this paper, we extend the application of existing Composed Image Retrieval (CIR) into semi-supervised learning, domain adaptation, and test-time adaptation contexts by exploiting only unlabeled image data. Previous approaches cannot be directly applied to these settings, as it is not trivial to leverage fully unlabeled data for CIR. To address this, we propose a new approach and settings where we identify a reference image and its associated target images in auxiliary image data. Our method involves training a large language model-based Visual Delta Generator (VDG) to produce textual descriptions of the visual differences (i.e., visual deltas) between these images. VDG, equipped with fluent language knowledge and being model agnostic, can generate pseudo-triplets to boost the performance of CIR models in diverse settings including semi-supervised CIR, domain adaptation, and test-time adaptation. Our approach significantly not only improves the existing supervised learning approaches and achieves state-of-the-art results on the CIR benchmarks but also expands the application of CIR across diverse settings.

摘要

合成图像检索(CIR)通过提供的文本修改(即由<参考图像,文本,目标图像>组成的三元组)来检索与参考图像相似的目标图像。先前关于CIR的工作大致可分为两类:监督学习方法和弱监督(即零样本)学习方法。监督学习的CIR模型需要有标签的三元组,而这些三元组可能不容易获得,并且限制了CIR的广泛应用及其可扩展性。另一方面,弱监督学习方法(也称为零样本CIR)可以相对容易地使用图像-标题对进行训练,而无需考虑图像与图像之间的关系(即不需要有监督的三元组),但这种方法往往会导致较低的准确率。在本文中,我们仅利用未标记的图像数据,将现有的合成图像检索(CIR)应用扩展到半监督学习、域适应和测试时适应的场景中。先前的方法不能直接应用于这些场景,因为利用完全未标记的数据进行CIR并非易事。为了解决这个问题,我们提出了一种新的方法和设置,即在辅助图像数据中识别参考图像及其相关的目标图像。我们的方法包括训练一个基于大语言模型的视觉差异生成器(VDG),以生成这些图像之间视觉差异(即视觉差异)的文本描述。VDG具备流利的语言知识且与模型无关,可以生成伪三元组,以提高CIR模型在包括半监督CIR、域适应和测试时适应在内的各种场景中的性能。我们的方法不仅显著改进了现有的监督学习方法,并在CIR基准测试中取得了领先的结果,还扩展了CIR在各种场景中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/a70bbef9a271/41598_2025_7798_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/e95dbe1c7bb6/41598_2025_7798_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/730350616348/41598_2025_7798_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/925cb10446eb/41598_2025_7798_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/410798d26a98/41598_2025_7798_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/82fae60bb4bf/41598_2025_7798_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/7cea488095b6/41598_2025_7798_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/f29619654ab5/41598_2025_7798_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/a70bbef9a271/41598_2025_7798_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/e95dbe1c7bb6/41598_2025_7798_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/730350616348/41598_2025_7798_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/925cb10446eb/41598_2025_7798_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/410798d26a98/41598_2025_7798_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/82fae60bb4bf/41598_2025_7798_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/7cea488095b6/41598_2025_7798_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/f29619654ab5/41598_2025_7798_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7490/12304162/a70bbef9a271/41598_2025_7798_Fig8_HTML.jpg

相似文献

1
Visual delta generation with large multi-modal models enhances composed image retrieval using unlabeled data.使用大型多模态模型生成视觉增量可增强利用未标记数据的合成图像检索。
Sci Rep. 2025 Jul 28;15(1):27463. doi: 10.1038/s41598-025-07798-6.
2
A segment anything model-guided and match-based semi-supervised segmentation framework for medical imaging.一种用于医学成像的基于段式分割模型引导和匹配的半监督分割框架。
Med Phys. 2025 Mar 29. doi: 10.1002/mp.17785.
3
Short-Term Memory Impairment短期记忆障碍
4
Boundary-aware information maximization for self-supervised medical image segmentation.用于自监督医学图像分割的边界感知信息最大化
Med Image Anal. 2024 May;94:103150. doi: 10.1016/j.media.2024.103150. Epub 2024 Mar 28.
5
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval.
IEEE Trans Pattern Anal Mach Intell. 2025 Nov;47(11):10801-10817. doi: 10.1109/TPAMI.2025.3593539.
6
Interventions to improve safe and effective medicines use by consumers: an overview of systematic reviews.改善消费者安全有效用药的干预措施:系统评价概述
Cochrane Database Syst Rev. 2014 Apr 29;2014(4):CD007768. doi: 10.1002/14651858.CD007768.pub3.
7
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
8
Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.使用移动应用程序与其他方法收集的自我管理调查问卷回复的比较。
Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.
9
123I-MIBG scintigraphy and 18F-FDG-PET imaging for diagnosing neuroblastoma.用于诊断神经母细胞瘤的123I-间碘苄胍闪烁扫描术和18F-氟代脱氧葡萄糖正电子发射断层显像
Cochrane Database Syst Rev. 2015 Sep 29;2015(9):CD009263. doi: 10.1002/14651858.CD009263.pub2.
10
Technological aids for the rehabilitation of memory and executive functioning in children and adolescents with acquired brain injury.脑损伤儿童和青少年记忆与执行功能康复的技术辅助手段。
Cochrane Database Syst Rev. 2016 Jul 1;7(7):CD011020. doi: 10.1002/14651858.CD011020.pub2.

本文引用的文献

1
Geometric Matching for Cross-Modal Retrieval.
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5509-5521. doi: 10.1109/TNNLS.2024.3381347. Epub 2025 Feb 28.
2
Semantics Disentangling for Cross-Modal Retrieval.用于跨模态检索的语义解缠
IEEE Trans Image Process. 2024;33:2226-2237. doi: 10.1109/TIP.2024.3374111. Epub 2024 Mar 25.
3
Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval.
IEEE Trans Pattern Anal Mach Intell. 2024 May;46(5):3665-3678. doi: 10.1109/TPAMI.2023.3346434. Epub 2024 Apr 3.
4
Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning.虚拟对抗训练:一种用于监督学习和半监督学习的正则化方法。
IEEE Trans Pattern Anal Mach Intell. 2019 Aug;41(8):1979-1993. doi: 10.1109/TPAMI.2018.2858821. Epub 2018 Jul 23.