Hua Yuansheng, Mou Lichao, Lin Jianzhe, Heidler Konrad, Zhu Xiao Xiang
Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Oberpfaffenhofen, 82234 Wessling, Germany.
Data Science in Earth Observation (SiPEO), Technical University of Munich (TUM), Arcisstr. 21, 80333 Munich, Germany.
ISPRS J Photogramm Remote Sens. 2021 Jul;177:89-102. doi: 10.1016/j.isprsjprs.2021.04.006.
Aerial scene recognition is a fundamental visual task and has attracted an increasing research interest in the last few years. Most of current researches mainly deploy efforts to categorize an aerial image into one scene-level label, while in real-world scenarios, there often exist multiple scenes in a single image. Therefore, in this paper, we propose to take a step forward to a more practical and challenging task, namely multi-scene recognition in single images. Moreover, we note that manually yielding annotations for such a task is extraordinarily time- and labor-consuming. To address this, we propose a prototype-based memory network to recognize multiple scenes in a single image by leveraging massive well-annotated single-scene images. The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module. To be more specific, we first learn the prototype representation of each aerial scene from single-scene aerial image datasets and store it in an external memory. Afterwards, a multi-head attention-based memory retrieval module is devised to retrieve scene prototypes relevant to query multi-scene images for final predictions. Notably, only a limited number of annotated multi-scene images are needed in the training phase. To facilitate the progress of aerial scene recognition, we produce a new multi-scene aerial image (MAI) dataset. Experimental results on variant dataset configurations demonstrate the effectiveness of our network. Our dataset and codes are publicly available.
航空场景识别是一项基本的视觉任务,在过去几年中引起了越来越多的研究兴趣。当前的大多数研究主要致力于将航空图像分类为一个场景级标签,而在现实世界场景中,单个图像中通常存在多个场景。因此,在本文中,我们提出向前迈进一步,开展一项更具实用性和挑战性的任务,即单图像中的多场景识别。此外,我们注意到,手动生成此类任务的注释非常耗时且费力。为了解决这个问题,我们提出了一种基于原型的记忆网络,通过利用大量标注良好的单场景图像来识别单图像中的多个场景。所提出的网络由三个关键组件组成:1)一个原型学习模块,2)一个驻留原型的外部存储器,以及3)一个基于多头注意力的记忆检索模块。更具体地说,我们首先从单场景航空图像数据集中学习每个航空场景的原型表示,并将其存储在外部存储器中。之后,设计一个基于多头注意力的记忆检索模块,以检索与查询多场景图像相关的场景原型,用于最终预测。值得注意的是,在训练阶段只需要有限数量的标注多场景图像。为了促进航空场景识别的进展,我们生成了一个新的多场景航空图像(MAI)数据集。在不同数据集配置上的实验结果证明了我们网络的有效性。我们的数据集和代码已公开可用。