• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

野外空中场景理解:通过基于原型的记忆网络进行多场景识别。

Aerial scene understanding in the wild: Multi-scene recognition via prototype-based memory networks.

作者信息

Hua Yuansheng, Mou Lichao, Lin Jianzhe, Heidler Konrad, Zhu Xiao Xiang

机构信息

Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Oberpfaffenhofen, 82234 Wessling, Germany.

Data Science in Earth Observation (SiPEO), Technical University of Munich (TUM), Arcisstr. 21, 80333 Munich, Germany.

出版信息

ISPRS J Photogramm Remote Sens. 2021 Jul;177:89-102. doi: 10.1016/j.isprsjprs.2021.04.006.

DOI:10.1016/j.isprsjprs.2021.04.006
PMID:34219969
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8218792/
Abstract

Aerial scene recognition is a fundamental visual task and has attracted an increasing research interest in the last few years. Most of current researches mainly deploy efforts to categorize an aerial image into one scene-level label, while in real-world scenarios, there often exist multiple scenes in a single image. Therefore, in this paper, we propose to take a step forward to a more practical and challenging task, namely multi-scene recognition in single images. Moreover, we note that manually yielding annotations for such a task is extraordinarily time- and labor-consuming. To address this, we propose a prototype-based memory network to recognize multiple scenes in a single image by leveraging massive well-annotated single-scene images. The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module. To be more specific, we first learn the prototype representation of each aerial scene from single-scene aerial image datasets and store it in an external memory. Afterwards, a multi-head attention-based memory retrieval module is devised to retrieve scene prototypes relevant to query multi-scene images for final predictions. Notably, only a limited number of annotated multi-scene images are needed in the training phase. To facilitate the progress of aerial scene recognition, we produce a new multi-scene aerial image (MAI) dataset. Experimental results on variant dataset configurations demonstrate the effectiveness of our network. Our dataset and codes are publicly available.

摘要

航空场景识别是一项基本的视觉任务,在过去几年中引起了越来越多的研究兴趣。当前的大多数研究主要致力于将航空图像分类为一个场景级标签,而在现实世界场景中,单个图像中通常存在多个场景。因此,在本文中,我们提出向前迈进一步,开展一项更具实用性和挑战性的任务,即单图像中的多场景识别。此外,我们注意到,手动生成此类任务的注释非常耗时且费力。为了解决这个问题,我们提出了一种基于原型的记忆网络,通过利用大量标注良好的单场景图像来识别单图像中的多个场景。所提出的网络由三个关键组件组成:1)一个原型学习模块,2)一个驻留原型的外部存储器,以及3)一个基于多头注意力的记忆检索模块。更具体地说,我们首先从单场景航空图像数据集中学习每个航空场景的原型表示,并将其存储在外部存储器中。之后,设计一个基于多头注意力的记忆检索模块,以检索与查询多场景图像相关的场景原型,用于最终预测。值得注意的是,在训练阶段只需要有限数量的标注多场景图像。为了促进航空场景识别的进展,我们生成了一个新的多场景航空图像(MAI)数据集。在不同数据集配置上的实验结果证明了我们网络的有效性。我们的数据集和代码已公开可用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/bd5941082436/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/499104df15a2/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/012f77e8659a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/c0c08ca67967/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/068e0789dba2/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/c8e912df2f42/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/bab6b55a1923/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/6a50a13da163/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/22621f9fbfa9/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/767b1a95d809/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/974ce88e89c4/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/44ad9dc33475/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/bc74c112e027/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/bd5941082436/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/499104df15a2/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/012f77e8659a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/c0c08ca67967/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/068e0789dba2/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/c8e912df2f42/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/bab6b55a1923/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/6a50a13da163/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/22621f9fbfa9/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/767b1a95d809/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/974ce88e89c4/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/44ad9dc33475/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/bc74c112e027/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/547d/8218792/bd5941082436/gr13.jpg

相似文献

1
Aerial scene understanding in the wild: Multi-scene recognition via prototype-based memory networks.野外空中场景理解:通过基于原型的记忆网络进行多场景识别。
ISPRS J Photogramm Remote Sens. 2021 Jul;177:89-102. doi: 10.1016/j.isprsjprs.2021.04.006.
2
Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification.在用于多标签航空图像分类的混合卷积和双向长短期记忆网络中反复探索类别注意力。
ISPRS J Photogramm Remote Sens. 2019 Mar;149:188-199. doi: 10.1016/j.isprsjprs.2019.01.015.
3
Local Semantic Enhanced ConvNet for Aerial Scene Recognition.基于局部语义增强的卷积神经网络的航空场景识别
IEEE Trans Image Process. 2021;30:6498-6511. doi: 10.1109/TIP.2021.3092816. Epub 2021 Jul 16.
4
S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification.S-MAT:用于多标签航空图像分类的语义驱动掩蔽注意力转换器。
Sensors (Basel). 2022 Jul 20;22(14):5433. doi: 10.3390/s22145433.
5
Self-supervised learning for remote sensing scene classification under the few shot scenario.基于小样本场景的遥感场景分类的自监督学习。
Sci Rep. 2023 Jan 9;13(1):433. doi: 10.1038/s41598-022-27313-5.
6
Deep Integration: A Multi-Label Architecture for Road Scene Recognition.深度集成:一种用于道路场景识别的多标签架构
IEEE Trans Image Process. 2019 Oct;28(10):4883-4898. doi: 10.1109/TIP.2019.2913079. Epub 2019 May 7.
7
A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification.基于双流深度融合的高分辨率航空场景分类框架。
Comput Intell Neurosci. 2018 Jan 18;2018:8639367. doi: 10.1155/2018/8639367. eCollection 2018.
8
Composite Object Relation Modeling for Few-Shot Scene Recognition.用于少样本场景识别的复合对象关系建模
IEEE Trans Image Process. 2023;32:5678-5691. doi: 10.1109/TIP.2023.3321475. Epub 2023 Oct 17.
9
One-Shot Any-Scene Crowd Counting With Local-to-Global Guidance.基于局部到全局引导的一次性任意场景人群计数
IEEE Trans Image Process. 2024;33:6622-6632. doi: 10.1109/TIP.2024.3420713. Epub 2024 Dec 3.
10
Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy.基于多任务 CNN 和图像相似性策略的视觉机器人重定位。
Sensors (Basel). 2020 Dec 4;20(23):6943. doi: 10.3390/s20236943.

引用本文的文献

1
Optimizing multimodal scene recognition through relevant feature selection approach for scene classification.通过用于场景分类的相关特征选择方法优化多模态场景识别。
MethodsX. 2025 Feb 17;14:103226. doi: 10.1016/j.mex.2025.103226. eCollection 2025 Jun.

本文引用的文献

1
Two-Branch Relational Prototypical Network for Weakly Supervised Temporal Action Localization.用于弱监督时间动作定位的双分支关系原型网络
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5729-5746. doi: 10.1109/TPAMI.2021.3076172. Epub 2022 Aug 4.
2
Multi-Temporal Scene Classification and Scene Change Detection With Correlation Based Fusion.基于相关性融合的多时态场景分类与场景变化检测
IEEE Trans Image Process. 2021;30:1382-1394. doi: 10.1109/TIP.2020.3039328. Epub 2020 Dec 29.
3
A framework for large-scale mapping of human settlement extent from Sentinel-2 images via fully convolutional neural networks.
一种通过全卷积神经网络从哨兵 - 2 图像进行大规模人类住区范围测绘的框架。
ISPRS J Photogramm Remote Sens. 2020 May;163:152-170. doi: 10.1016/j.isprsjprs.2020.01.028.
4
A Multiple-Instance Densely-Connected ConvNet for Aerial Scene Classification.用于航空场景分类的多实例密集连接卷积网络。
IEEE Trans Image Process. 2020 Mar 3. doi: 10.1109/TIP.2020.2975718.
5
Local climate zone-based urban land cover classification from multi-seasonal Sentinel-2 images with a recurrent residual network.基于局部气候区的城市土地覆盖分类:利用循环残差网络从多季节哨兵-2影像中进行分类
ISPRS J Photogramm Remote Sens. 2019 Aug;154:151-162. doi: 10.1016/j.isprsjprs.2019.05.004.
6
Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification.在用于多标签航空图像分类的混合卷积和双向长短期记忆网络中反复探索类别注意力。
ISPRS J Photogramm Remote Sens. 2019 Mar;149:188-199. doi: 10.1016/j.isprsjprs.2019.01.015.
7
The easy-to-hard effect in human (Homo sapiens) and rat (Rattus norvegicus) auditory identification.人类(智人)和大鼠(褐家鼠)听觉识别中的从易到难效应。
J Comp Psychol. 2008 May;122(2):132-45. doi: 10.1037/0735-7036.122.2.132.