Suppr超能文献

基于松弛正样本的多模态对比学习用于遥感图像特征提取

Multimodal Contrastive Learning for Remote Sensing Image Feature Extraction Based on Relaxed Positive Samples.

作者信息

Zhang Zhenshi, Li Qiujun, Jing Wenxuan, He Guangjun, Zhu Lili, Gao Shijuan

机构信息

College of Basic Education, National University of Defense Technology, Changsha 410073, China.

School of Geosciences and Info-Physics, Central South University, Changsha 410083, China.

出版信息

Sensors (Basel). 2024 Dec 3;24(23):7719. doi: 10.3390/s24237719.

Abstract

Traditional multimodal contrastive learning brings text and its corresponding image closer together as a positive pair, where the text typically consists of fixed sentence structures or specific descriptive statements, and the image features are generally global features (with some fine-grained work using local features). Similar to unimodal self-supervised contrastive learning, this approach can be seen as enforcing a strict identity constraint in a multimodal context. However, due to the inherent complexity of remote sensing images, which cannot be easily described in a single sentence, and the fact that remote sensing images contain rich ancillary information beyond just object features, this strict identity constraint may be insufficient. To fully leverage the characteristics of remote sensing images, we propose a multimodal contrastive learning method for remote sensing image feature extraction, based on positive sample tripartite relaxation, where the model is relaxed in three aspects. The first aspect of relaxation involves both the text and image inputs. By introducing learnable parameters in the language and image branches, instead of relying on fixed sentence structures and fixed image features, the network can achieve a more flexible description of remote sensing images in text and extract ancillary information from the image features, thereby relaxing the input constraints. Second relaxation is achieved through multimodal alignment of various features. By aligning semantic information with the corresponding semantic regions in the images, the method allows for the relaxation of local image features under semantic constraints. This approach addresses the issue of selecting image patches in unimodal settings, where there is no semantic constraint. The proposed method for remote sensing image feature extraction has been validated on four datasets. On the PatternNet dataset, it achieved a 91.1% accuracy with just one-shot.

摘要

传统的多模态对比学习将文本及其对应的图像作为正样本对拉近,其中文本通常由固定的句子结构或特定的描述性语句组成,图像特征一般为全局特征(也有一些细粒度工作使用局部特征)。与单模态自监督对比学习类似,这种方法可视为在多模态环境中实施严格的身份约束。然而,由于遥感图像固有的复杂性,难以用单个句子轻松描述,且遥感图像除了目标特征外还包含丰富的辅助信息,这种严格的身份约束可能并不充分。为了充分利用遥感图像的特性,我们提出一种基于正样本三方松弛的用于遥感图像特征提取的多模态对比学习方法,该模型在三个方面进行了松弛。第一个松弛方面涉及文本和图像输入。通过在语言和图像分支中引入可学习参数,而非依赖固定的句子结构和固定的图像特征,网络能够在文本中对遥感图像实现更灵活的描述,并从图像特征中提取辅助信息,从而放宽输入约束。第二个松弛是通过各种特征的多模态对齐实现的。通过将语义信息与图像中的相应语义区域对齐,该方法允许在语义约束下放宽局部图像特征。这种方法解决了单模态设置中选择图像块时没有语义约束的问题。所提出的遥感图像特征提取方法已在四个数据集上得到验证。在PatternNet数据集上,它仅通过一次尝试就达到了91.1%的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/82769971f633/sensors-24-07719-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验