一种用于跨模态遥感图像检索的、特定于聚合多尺度信息的细粒度语义对齐方法。

A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval.

作者信息

Zheng Fuzhong, Wang Xu, Wang Luyao, Zhang Xiong, Zhu Hongze, Wang Long, Zhang Haisu

机构信息

College of Information and Communication, National University of Defense Technology, Wuhan 430074, China.

出版信息

Sensors (Basel). 2023 Oct 13;23(20):8437. doi: 10.3390/s23208437.

DOI:10.3390/s23208437

PMID:37896530

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10610807/

Abstract

Due to the swift growth in the scale of remote sensing imagery, scholars have progressively directed their attention towards achieving efficient and adaptable cross-modal retrieval for remote sensing images. They have also steadily tackled the distinctive challenge posed by the multi-scale attributes of these images. However, existing studies primarily concentrate on the characterization of these features, neglecting the comprehensive investigation of the complex relationship between multi-scale targets and the semantic alignment of these targets with text. To address this issue, this study introduces a fine-grained semantic alignment method that adequately aggregates multi-scale information (referred to as FAAMI). The proposed approach comprises multiple stages. Initially, we employ a computing-friendly cross-layer feature connection method to construct a multi-scale feature representation of an image. Subsequently, we devise an efficient feature consistency enhancement module to rectify the incongruous semantic discrimination observed in cross-layer features. Finally, a shallow cross-attention network is employed to capture the fine-grained semantic relationship between multiple-scale image regions and the corresponding words in the text. Extensive experiments were conducted using two datasets: RSICD and RSITMD. The results demonstrate that the performance of FAAMI surpasses that of recently proposed advanced models in the same domain, with significant improvements observed in R@K and other evaluation metrics. Specifically, the mR values achieved by FAAMI are 23.18% and 35.99% for the two datasets, respectively.

摘要

由于遥感影像规模的迅速增长，学者们逐渐将注意力转向实现高效且适应性强的遥感影像跨模态检索。他们也在稳步应对这些影像多尺度属性带来的独特挑战。然而，现有研究主要集中于这些特征的表征，而忽视了对多尺度目标之间复杂关系以及这些目标与文本语义对齐的全面研究。为解决这一问题，本研究引入一种能充分聚合多尺度信息的细粒度语义对齐方法（称为FAAMI）。所提出的方法包括多个阶段。首先，我们采用一种计算友好的跨层特征连接方法来构建图像的多尺度特征表示。随后，我们设计了一个高效的特征一致性增强模块来纠正跨层特征中观察到的不一致语义判别。最后，使用一个浅层交叉注意力网络来捕捉多尺度图像区域与文本中相应单词之间的细粒度语义关系。使用两个数据集RSICD和RSITMD进行了广泛的实验。结果表明，FAAMI的性能超过了同一领域最近提出的先进模型，在R@K和其他评估指标上有显著提升。具体而言，FAAMI在两个数据集上分别实现的mR值为23.18%和35.99%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f3/10610807/570af1e36de8/sensors-23-08437-g001.jpg

相似文献

A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval.

Sensors (Basel). 2023 Oct 13;23(20):8437. doi: 10.3390/s23208437.

Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.

Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.

Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval.

IEEE Trans Image Process. 2022;31:7154-7164. doi: 10.1109/TIP.2022.3220051. Epub 2022 Nov 16.

Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval.

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2194-2207. doi: 10.1109/TNNLS.2022.3188569. Epub 2024 Feb 5.

Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval.

IEEE Trans Image Process. 2021;30:9193-9207. doi: 10.1109/TIP.2021.3123553. Epub 2021 Nov 10.

Boosting cross-modal retrieval in remote sensing via a novel unified attention network.

Neural Netw. 2024 Dec;180:106718. doi: 10.1016/j.neunet.2024.106718. Epub 2024 Sep 11.

Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.

IEEE Trans Image Process. 2020 Sep 10;PP. doi: 10.1109/TIP.2020.3020383.

Multi-FusNet: fusion mapping of features for fine-grained image retrieval networks.

PeerJ Comput Sci. 2024 Jun 24;10:e2025. doi: 10.7717/peerj-cs.2025. eCollection 2024.

Unsupervised Visual-Textual Correlation Learning With Fine-Grained Semantic Alignment.

IEEE Trans Cybern. 2022 May;52(5):3669-3683. doi: 10.1109/TCYB.2020.3015084. Epub 2022 May 19.

CLIP-Driven Fine-Grained Text-Image Person Re-Identification.

IEEE Trans Image Process. 2023;32:6032-6046. doi: 10.1109/TIP.2023.3327924. Epub 2023 Nov 7.

本文引用的文献

A Review on Multiscale-Deep-Learning Applications.

Sensors (Basel). 2022 Sep 28;22(19):7384. doi: 10.3390/s22197384.

Squeeze-and-Excitation Networks.

IEEE Trans Pattern Anal Mach Intell. 2020 Aug;42(8):2011-2023. doi: 10.1109/TPAMI.2019.2913372. Epub 2019 Apr 29.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.

IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16. doi: 10.1109/TPAMI.2015.2389824.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于跨模态遥感图像检索的、特定于聚合多尺度信息的细粒度语义对齐方法。

A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献