Suppr超能文献

S-MAT:用于多标签航空图像分类的语义驱动掩蔽注意力转换器。

S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification.

机构信息

Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China.

Institute for Brain and Cognitive Sciences, Beijing Union University, Beijing 100101, China.

出版信息

Sensors (Basel). 2022 Jul 20;22(14):5433. doi: 10.3390/s22145433.

Abstract

Multi-label aerial scene image classification is a long-standing and challenging research problem in the remote sensing field. As land cover objects usually co-exist in an aerial scene image, modeling label dependencies is a compelling approach to improve the performance. Previous methods generally directly model the label dependencies among all the categories in the target dataset. However, most of the semantic features extracted from an image are relevant to the existing objects, making the dependencies among the nonexistant categories unable to be effectively evaluated. These redundant label dependencies may bring noise and further decrease the performance of classification. To solve this problem, we propose S-MAT, a Semantic-driven Masked Attention Transformer for multi-label aerial scene image classification. S-MAT adopts a Masked Attention Transformer (MAT) to capture the correlations among the label embeddings constructed by a Semantic Disentanglement Module (SDM). Moreover, the proposed masked attention in MAT can filter out the redundant dependencies and enhance the robustness of the model. As a result, the proposed method can explicitly and accurately capture the label dependencies. Therefore, our method achieves CF1s of 89.21%, 90.90%, and 88.31% on three multi-label aerial scene image classification benchmark datasets: UC-Merced Multi-label, AID Multi-label, and MLRSNet, respectively. In addition, extensive ablation studies and empirical analysis are provided to demonstrate the effectiveness of the essential components of our method under different factors.

摘要

多标签航空场景图像分类是遥感领域中长期存在的具有挑战性的研究问题。由于地物对象通常在航空场景图像中共存,因此建模标签依赖关系是提高性能的一种有效方法。先前的方法通常直接对目标数据集中所有类别的标签依赖关系进行建模。然而,从图像中提取的大多数语义特征与现有对象相关,使得不存在类别的依赖关系无法得到有效评估。这些冗余的标签依赖关系可能会带来噪声,进一步降低分类性能。为了解决这个问题,我们提出了 S-MAT,这是一种用于多标签航空场景图像分类的语义驱动掩蔽注意力转换器。S-MAT 采用掩蔽注意力转换器(MAT)来捕获语义解耦模块(SDM)构建的标签嵌入之间的相关性。此外,MAT 中提出的掩蔽注意力可以滤除冗余的依赖关系,增强模型的稳健性。因此,该方法可以明确而准确地捕捉标签依赖关系。因此,我们的方法在三个多标签航空场景图像分类基准数据集 UC-Merced Multi-label、AID Multi-label 和 MLRSNet 上分别实现了 89.21%、90.90%和 88.31%的 CF1s。此外,还提供了广泛的消融研究和实证分析,以证明在不同因素下我们方法的基本组件的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4183/9317133/65981148bbfa/sensors-22-05433-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验