• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过上下文和不确定性感知提示实现跨模态遥感图像-文本检索

Cross-Modal Remote Sensing Image-Text Retrieval via Context and Uncertainty-Aware Prompt.

作者信息

Wang Yijing, Tang Xu, Ma Jingjing, Zhang Xiangrong, Liu Fang, Jiao Licheng

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jun;36(6):11384-11398. doi: 10.1109/TNNLS.2024.3458898.

DOI:10.1109/TNNLS.2024.3458898
PMID:39316486
Abstract

The cross-modal remote sensing image-text retrieval (CMRSITR) is a lively research topic in the remote sensing (RS) community. Benefiting from the large pretrained image-text models, many successful CMRSITR methods have been proposed in recent years. Although their performance is attractive, there are still some challenges. First, fine-tuning large pretrained models requires a significant amount of computational resources. Second, most large models are pretrained by natural images, which reduces their effectiveness in processing RS images. To tackle these challenges, we propose a new CMRSITR network named context and uncertainty-aware prompt (CUP). First, prompt tuning theory is introduced into CUP to eliminate the burden of optimization resources. By training the prompt tokens rather than all parameters, the large model's knowledge can be transferred to CMRSITR tasks with small trainable parameters. Second, considering the differences between natural-image-based prior clues and RS images, apart from adopting the free-prompt tokens, we develop a prompt generation module (PGM) to produce the RS-oriented prompt tokens. The specific prompt tokens are rich in object-level messages of RS images, which help CUP narrow the gaps between natural large models and RS images. Third, we further design an uncertainty estimation module (UEM) to whittle down the uncertainties caused by the model and data. This way, can not only the semantic misalignment and intraclass diversity imbalance problems be mitigated but also the RS clues can be deeply explored. Competitive experimental results counted on three public benchmark datasets demonstrate that our CUP can achieve competitive performance in the CMRSITR task compared with many existing methods. Our source codes are available at: https://github.com/TangXu-Group/Cross-modal-remote-sensing-image-and-text-retrieval-models/tree/main/CUP.

摘要

跨模态遥感图像-文本检索(CMRSITR)是遥感(RS)领域一个活跃的研究课题。受益于大型预训练图像-文本模型,近年来提出了许多成功的CMRSITR方法。尽管它们的性能很有吸引力,但仍然存在一些挑战。首先,微调大型预训练模型需要大量的计算资源。其次,大多数大型模型是由自然图像预训练的,这降低了它们在处理遥感图像时的有效性。为了应对这些挑战,我们提出了一种名为上下文和不确定性感知提示(CUP)的新型CMRSITR网络。首先,将提示调整理论引入CUP以消除优化资源的负担。通过训练提示令牌而不是所有参数,可以将大型模型的知识转移到具有少量可训练参数的CMRSITR任务中。其次,考虑到基于自然图像的先验线索与遥感图像之间的差异,除了采用自由提示令牌外,我们还开发了一个提示生成模块(PGM)来生成面向遥感的提示令牌。特定的提示令牌富含遥感图像的对象级消息,这有助于CUP缩小自然大型模型与遥感图像之间的差距。第三,我们进一步设计了一个不确定性估计模块(UEM)来减少由模型和数据引起的不确定性。通过这种方式,不仅可以减轻语义错位和类内多样性不平衡问题,还可以深入探索遥感线索。在三个公共基准数据集上进行的竞争性实验结果表明,与许多现有方法相比,我们的CUP在CMRSITR任务中可以实现有竞争力的性能。我们的源代码可在以下网址获得:https://github.com/TangXu-Group/Cross-modal-remote-sensing-image-and-text-retrieval-models/tree/main/CUP。

相似文献

1
Cross-Modal Remote Sensing Image-Text Retrieval via Context and Uncertainty-Aware Prompt.通过上下文和不确定性感知提示实现跨模态遥感图像-文本检索
IEEE Trans Neural Netw Learn Syst. 2025 Jun;36(6):11384-11398. doi: 10.1109/TNNLS.2024.3458898.
2
MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.MCPL:用于医学视觉语言模型的多模态协作提示学习
IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.
3
Embedded prompt tuning: Towards enhanced calibration of pretrained models for medical images.嵌入式提示调整:增强医学图像预训练模型校准的新途径。
Med Image Anal. 2024 Oct;97:103258. doi: 10.1016/j.media.2024.103258. Epub 2024 Jul 4.
4
Token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting.Token-Mixer:将图像和文本绑定在一个嵌入空间中用于医疗图像报告。
IEEE Trans Med Imaging. 2024 Nov;43(11):4017-4028. doi: 10.1109/TMI.2024.3412402. Epub 2024 Nov 4.
5
Boosting cross-modal retrieval in remote sensing via a novel unified attention network.通过一种新颖的统一注意力网络提升遥感的跨模态检索。
Neural Netw. 2024 Dec;180:106718. doi: 10.1016/j.neunet.2024.106718. Epub 2024 Sep 11.
6
SAGN: Semantic-Aware Graph Network for Remote Sensing Scene Classification.SAGN:用于遥感场景分类的语义感知图网络。
IEEE Trans Image Process. 2023;32:1011-1025. doi: 10.1109/TIP.2023.3238310. Epub 2023 Jan 31.
7
Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval.用于文本到图像人物检索的文本引导图像恢复与语义增强
Neural Netw. 2025 Apr;184:107028. doi: 10.1016/j.neunet.2024.107028. Epub 2024 Dec 16.
8
CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models.CuTCP:用于视觉语言模型的基于自定义文本生成的类别感知提示调优
Sci Rep. 2025 Jan 21;15(1):2681. doi: 10.1038/s41598-025-85838-x.
9
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性
Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.
10
Txt2Img-MHN: Remote Sensing Image Generation From Text Using Modern Hopfield Networks.Txt2Img-MHN:使用现代霍普菲尔德网络从文本生成遥感图像。
IEEE Trans Image Process. 2023;32:5737-5750. doi: 10.1109/TIP.2023.3323799. Epub 2023 Oct 24.