• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过一种新颖的统一注意力网络提升遥感的跨模态检索。

Boosting cross-modal retrieval in remote sensing via a novel unified attention network.

机构信息

Indian Institute of Technology, Bombay, India.

Indian Institute of Technology, Bombay, India.

出版信息

Neural Netw. 2024 Dec;180:106718. doi: 10.1016/j.neunet.2024.106718. Epub 2024 Sep 11.

DOI:10.1016/j.neunet.2024.106718
PMID:39293179
Abstract

With the rapid advent and abundance of remote sensing data in different modalities, cross-modal retrieval tasks have gained importance in the research community. Cross-modal retrieval belongs to the research paradigm in which the query is of one modality and the retrieved output is of the other modality. In this paper, the remote sensing (RS) data modalities considered are the earth observation optical data (aerial photos) and the corresponding hand-drawn sketches. The main challenge of the cross-modal retrieval research objective for optical remote sensing images and the corresponding sketches is the distribution gap between the shared embedding space of the modalities. Prior attempts to resolve this issue have not yielded satisfactory outcomes regarding accurately retrieving cross-modal sketch-image RS data. The state-of-the-art architectures used conventional convolutional architectures, which focused on local pixel-wise information about the modalities to be retrieved. This limits the interaction between the sketch texture and the corresponding image, making these models susceptible to overfitting datasets with particular scenarios. To circumvent this limitation, we suggest establishing multi-modal correspondence using a novel architecture of the combined self and cross-attention algorithms, SPCA-Net to minimize the modality gap by employing attention mechanisms for the query and other modalities. Efficient cross-modal retrieval is achieved through the suggested attention architecture, which empirically emphasizes the global information of the relevant query modality and bridges the domain gap through a unique pairwise cross-attention network. In addition to the novel architecture, this paper introduces a unique loss function, label-specific supervised contrastive loss, tailored to the intricacies of the task and to enhance the discriminative power of the learned embeddings. Extensive evaluations are conducted on two sketch-image remote sensing datasets, Earth-on-Canvas and RSketch. Under the same experimental conditions, the performance metrics of our proposed model beat the state-of-the-art architectures by significant margins of 16.7%, 18.9%, 33.7%, and 40.9% correspondingly.

摘要

随着不同模态遥感数据的快速涌现和丰富,跨模态检索任务在研究界变得越来越重要。跨模态检索属于查询为一种模态,检索输出为另一种模态的研究范式。在本文中,所考虑的遥感(RS)数据模态是地球观测光学数据(航空照片)和相应的手绘草图。光学遥感图像和相应草图的跨模态检索研究目标的主要挑战是模态共享嵌入空间之间的分布差距。以前尝试解决这个问题的方法在准确检索跨模态草图-图像 RS 数据方面并没有取得令人满意的结果。用于检索的现有最先进架构使用了传统的卷积架构,这些架构专注于要检索的模态的局部像素级信息。这限制了草图纹理和相应图像之间的相互作用,使得这些模型容易受到具有特定场景的数据集的过拟合。为了规避此限制,我们建议使用组合自注意力和交叉注意力算法的新型架构 SPCA-Net 来建立多模态对应关系,通过使用注意力机制来最小化模态差距。所提出的注意力架构通过使用注意力机制来强调相关查询模态的全局信息,并通过独特的成对交叉注意力网络来弥合域差距,从而实现高效的跨模态检索。除了新颖的架构外,本文还引入了一种独特的损失函数,即标签特定监督对比损失,旨在针对任务的复杂性和增强学习特征的辨别力。在两个草图-图像遥感数据集 Earth-on-Canvas 和 RSketch 上进行了广泛的评估。在相同的实验条件下,我们提出的模型的性能指标比现有最先进架构分别高出 16.7%、18.9%、33.7%和 40.9%。

相似文献

1
Boosting cross-modal retrieval in remote sensing via a novel unified attention network.通过一种新颖的统一注意力网络提升遥感的跨模态检索。
Neural Netw. 2024 Dec;180:106718. doi: 10.1016/j.neunet.2024.106718. Epub 2024 Sep 11.
2
CMR-net: A cross modality reconstruction network for multi-modality remote sensing classification.CMR-net:一种用于多模态遥感分类的跨模态重建网络。
PLoS One. 2024 Jun 25;19(6):e0304999. doi: 10.1371/journal.pone.0304999. eCollection 2024.
3
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.
4
Data augmentation-assisted deep learning of hand-drawn partially colored sketches for visual search.用于视觉搜索的手绘部分彩色草图的数据增强辅助深度学习。
PLoS One. 2017 Aug 31;12(8):e0183838. doi: 10.1371/journal.pone.0183838. eCollection 2017.
5
Self-supervision assisted multimodal remote sensing image classification with coupled self-looping convolution networks.自监督辅助的多模态遥感图像分类与耦合自循环卷积网络。
Neural Netw. 2023 Jul;164:1-20. doi: 10.1016/j.neunet.2023.04.019. Epub 2023 Apr 20.
6
Remote sensing image information extraction based on Compensated Fuzzy Neural Network and big data analytics.基于补偿模糊神经网络和大数据分析的遥感图像信息提取。
BMC Med Imaging. 2024 Apr 10;24(1):86. doi: 10.1186/s12880-024-01266-9.
7
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
8
Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval.通过图表示学习弥合多媒体异质鸿沟进行跨模态检索。
Neural Netw. 2021 Feb;134:143-162. doi: 10.1016/j.neunet.2020.11.011. Epub 2020 Nov 28.
9
Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.用于零样本基于草图的图像检索的渐进式跨模态语义网络
IEEE Trans Image Process. 2020 Sep 10;PP. doi: 10.1109/TIP.2020.3020383.
10
A 3D hierarchical cross-modality interaction network using transformers and convolutions for brain glioma segmentation in MR images.一种使用变换和卷积的 3D 层次跨模态交互网络,用于磁共振图像中的脑胶质瘤分割。
Med Phys. 2024 Nov;51(11):8371-8389. doi: 10.1002/mp.17354. Epub 2024 Aug 13.