• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于松弛正样本的多模态对比学习用于遥感图像特征提取

Multimodal Contrastive Learning for Remote Sensing Image Feature Extraction Based on Relaxed Positive Samples.

作者信息

Zhang Zhenshi, Li Qiujun, Jing Wenxuan, He Guangjun, Zhu Lili, Gao Shijuan

机构信息

College of Basic Education, National University of Defense Technology, Changsha 410073, China.

School of Geosciences and Info-Physics, Central South University, Changsha 410083, China.

出版信息

Sensors (Basel). 2024 Dec 3;24(23):7719. doi: 10.3390/s24237719.

DOI:10.3390/s24237719
PMID:39686255
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11644927/
Abstract

Traditional multimodal contrastive learning brings text and its corresponding image closer together as a positive pair, where the text typically consists of fixed sentence structures or specific descriptive statements, and the image features are generally global features (with some fine-grained work using local features). Similar to unimodal self-supervised contrastive learning, this approach can be seen as enforcing a strict identity constraint in a multimodal context. However, due to the inherent complexity of remote sensing images, which cannot be easily described in a single sentence, and the fact that remote sensing images contain rich ancillary information beyond just object features, this strict identity constraint may be insufficient. To fully leverage the characteristics of remote sensing images, we propose a multimodal contrastive learning method for remote sensing image feature extraction, based on positive sample tripartite relaxation, where the model is relaxed in three aspects. The first aspect of relaxation involves both the text and image inputs. By introducing learnable parameters in the language and image branches, instead of relying on fixed sentence structures and fixed image features, the network can achieve a more flexible description of remote sensing images in text and extract ancillary information from the image features, thereby relaxing the input constraints. Second relaxation is achieved through multimodal alignment of various features. By aligning semantic information with the corresponding semantic regions in the images, the method allows for the relaxation of local image features under semantic constraints. This approach addresses the issue of selecting image patches in unimodal settings, where there is no semantic constraint. The proposed method for remote sensing image feature extraction has been validated on four datasets. On the PatternNet dataset, it achieved a 91.1% accuracy with just one-shot.

摘要

传统的多模态对比学习将文本及其对应的图像作为正样本对拉近,其中文本通常由固定的句子结构或特定的描述性语句组成,图像特征一般为全局特征(也有一些细粒度工作使用局部特征)。与单模态自监督对比学习类似,这种方法可视为在多模态环境中实施严格的身份约束。然而,由于遥感图像固有的复杂性,难以用单个句子轻松描述,且遥感图像除了目标特征外还包含丰富的辅助信息,这种严格的身份约束可能并不充分。为了充分利用遥感图像的特性,我们提出一种基于正样本三方松弛的用于遥感图像特征提取的多模态对比学习方法,该模型在三个方面进行了松弛。第一个松弛方面涉及文本和图像输入。通过在语言和图像分支中引入可学习参数,而非依赖固定的句子结构和固定的图像特征,网络能够在文本中对遥感图像实现更灵活的描述,并从图像特征中提取辅助信息,从而放宽输入约束。第二个松弛是通过各种特征的多模态对齐实现的。通过将语义信息与图像中的相应语义区域对齐,该方法允许在语义约束下放宽局部图像特征。这种方法解决了单模态设置中选择图像块时没有语义约束的问题。所提出的遥感图像特征提取方法已在四个数据集上得到验证。在PatternNet数据集上,它仅通过一次尝试就达到了91.1%的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/77e87fb92954/sensors-24-07719-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/82769971f633/sensors-24-07719-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/6914352383cc/sensors-24-07719-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/b339293adbe6/sensors-24-07719-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/c74cb937aab1/sensors-24-07719-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/7f2518d0bb68/sensors-24-07719-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/1c20d3600fff/sensors-24-07719-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/77e87fb92954/sensors-24-07719-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/82769971f633/sensors-24-07719-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/6914352383cc/sensors-24-07719-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/b339293adbe6/sensors-24-07719-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/c74cb937aab1/sensors-24-07719-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/7f2518d0bb68/sensors-24-07719-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/1c20d3600fff/sensors-24-07719-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b85c/11644927/77e87fb92954/sensors-24-07719-g007.jpg

相似文献

1
Multimodal Contrastive Learning for Remote Sensing Image Feature Extraction Based on Relaxed Positive Samples.基于松弛正样本的多模态对比学习用于遥感图像特征提取
Sensors (Basel). 2024 Dec 3;24(23):7719. doi: 10.3390/s24237719.
2
Enhancing Semi-Supervised Semantic Segmentation of Remote Sensing Images via Feature Perturbation-Based Consistency Regularization Methods.基于特征扰动的一致性正则化方法增强遥感图像的半监督语义分割
Sensors (Basel). 2024 Jan 23;24(3):730. doi: 10.3390/s24030730.
3
Word self-update contrastive adversarial networks for text-to-image synthesis.基于词自更新对比对抗网络的文本到图像合成。
Neural Netw. 2023 Oct;167:433-444. doi: 10.1016/j.neunet.2023.08.038. Epub 2023 Aug 25.
4
Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media.用于社交媒体上基于方面的多模态情感分析的文本图像增强自监督对齐模型
Sensors (Basel). 2025 Apr 17;25(8):2553. doi: 10.3390/s25082553.
5
Centralized contrastive loss with weakly supervised progressive feature extraction for fine-grained common thorax disease retrieval in chest x-ray.基于集中对比损失和弱监督渐进式特征提取的胸部 X 射线细粒度常见胸部疾病检索方法。
Med Phys. 2023 Jun;50(6):3560-3572. doi: 10.1002/mp.16144. Epub 2023 Jan 11.
6
Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning.基于句子级图像-语言对比学习的多粒度放射学报告生成
IEEE Trans Med Imaging. 2024 Jul;43(7):2657-2669. doi: 10.1109/TMI.2024.3372638. Epub 2024 Jul 1.
7
Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.基于伪标签自训练的局部对比损失的半监督医学图像分割。
Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.
8
Reducing annotation burden in MR: A novel MR-contrast guided contrastive learning approach for image segmentation.减少磁共振成像中的标注负担:一种新的基于磁共振对比引导的对比学习方法用于图像分割。
Med Phys. 2024 Apr;51(4):2707-2720. doi: 10.1002/mp.16820. Epub 2023 Nov 13.
9
Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network.基于自注意力网络的高空间分辨率遥感影像稳健建筑物提取
Sensors (Basel). 2020 Dec 17;20(24):7241. doi: 10.3390/s20247241.
10
Difference-complementary Learning and Label Reassignment for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images.用于遥感图像多模态半监督语义分割的差异互补学习与标签重新分配
IEEE Trans Image Process. 2025 Jan 10;PP. doi: 10.1109/TIP.2025.3526064.

引用本文的文献

1
Seasonal Land Use and Land Cover Mapping in South American Agricultural Watersheds Using Multisource Remote Sensing: The Case of Cuenca Laguna Merín, Uruguay.利用多源遥感技术绘制南美农业流域的季节性土地利用和土地覆盖图:以乌拉圭梅林湖流域为例。
Sensors (Basel). 2025 Jan 3;25(1):228. doi: 10.3390/s25010228.

本文引用的文献

1
Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations.无增强的不变判别表示的图对比学习
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):11157-11167. doi: 10.1109/TNNLS.2023.3248871. Epub 2024 Aug 5.
2
Seed the Views: Hierarchical Semantic Alignment for Contrastive Representation Learning.播种视图:用于对比表示学习的层次语义对齐
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3753-3767. doi: 10.1109/TPAMI.2022.3176690. Epub 2023 Feb 3.