Suppr超能文献

通过上下文和不确定性感知提示实现跨模态遥感图像-文本检索

Cross-Modal Remote Sensing Image-Text Retrieval via Context and Uncertainty-Aware Prompt.

作者信息

Wang Yijing, Tang Xu, Ma Jingjing, Zhang Xiangrong, Liu Fang, Jiao Licheng

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jun;36(6):11384-11398. doi: 10.1109/TNNLS.2024.3458898.

Abstract

The cross-modal remote sensing image-text retrieval (CMRSITR) is a lively research topic in the remote sensing (RS) community. Benefiting from the large pretrained image-text models, many successful CMRSITR methods have been proposed in recent years. Although their performance is attractive, there are still some challenges. First, fine-tuning large pretrained models requires a significant amount of computational resources. Second, most large models are pretrained by natural images, which reduces their effectiveness in processing RS images. To tackle these challenges, we propose a new CMRSITR network named context and uncertainty-aware prompt (CUP). First, prompt tuning theory is introduced into CUP to eliminate the burden of optimization resources. By training the prompt tokens rather than all parameters, the large model's knowledge can be transferred to CMRSITR tasks with small trainable parameters. Second, considering the differences between natural-image-based prior clues and RS images, apart from adopting the free-prompt tokens, we develop a prompt generation module (PGM) to produce the RS-oriented prompt tokens. The specific prompt tokens are rich in object-level messages of RS images, which help CUP narrow the gaps between natural large models and RS images. Third, we further design an uncertainty estimation module (UEM) to whittle down the uncertainties caused by the model and data. This way, can not only the semantic misalignment and intraclass diversity imbalance problems be mitigated but also the RS clues can be deeply explored. Competitive experimental results counted on three public benchmark datasets demonstrate that our CUP can achieve competitive performance in the CMRSITR task compared with many existing methods. Our source codes are available at: https://github.com/TangXu-Group/Cross-modal-remote-sensing-image-and-text-retrieval-models/tree/main/CUP.

摘要

跨模态遥感图像-文本检索(CMRSITR)是遥感(RS)领域一个活跃的研究课题。受益于大型预训练图像-文本模型,近年来提出了许多成功的CMRSITR方法。尽管它们的性能很有吸引力,但仍然存在一些挑战。首先,微调大型预训练模型需要大量的计算资源。其次,大多数大型模型是由自然图像预训练的,这降低了它们在处理遥感图像时的有效性。为了应对这些挑战,我们提出了一种名为上下文和不确定性感知提示(CUP)的新型CMRSITR网络。首先,将提示调整理论引入CUP以消除优化资源的负担。通过训练提示令牌而不是所有参数,可以将大型模型的知识转移到具有少量可训练参数的CMRSITR任务中。其次,考虑到基于自然图像的先验线索与遥感图像之间的差异,除了采用自由提示令牌外,我们还开发了一个提示生成模块(PGM)来生成面向遥感的提示令牌。特定的提示令牌富含遥感图像的对象级消息,这有助于CUP缩小自然大型模型与遥感图像之间的差距。第三,我们进一步设计了一个不确定性估计模块(UEM)来减少由模型和数据引起的不确定性。通过这种方式,不仅可以减轻语义错位和类内多样性不平衡问题,还可以深入探索遥感线索。在三个公共基准数据集上进行的竞争性实验结果表明,与许多现有方法相比,我们的CUP在CMRSITR任务中可以实现有竞争力的性能。我们的源代码可在以下网址获得:https://github.com/TangXu-Group/Cross-modal-remote-sensing-image-and-text-retrieval-models/tree/main/CUP。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验