• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NetVLAD:用于弱监督场景识别的卷积神经网络架构。

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1437-1451. doi: 10.1109/TPAMI.2017.2711011. Epub 2017 Jun 1.

DOI:10.1109/TPAMI.2017.2711011
PMID:28622667
Abstract

We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following four principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the "Vector of Locally Aggregated Descriptors" image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we create a new weakly supervised ranking loss, which enables end-to-end learning of the architecture's parameters from images depicting the same places over time downloaded from Google Street View Time Machine. Third, we develop an efficient training procedure which can be applied on very large-scale weakly labelled tasks. Finally, we show that the proposed architecture and training procedure significantly outperform non-learnt image representations and off-the-shelf CNN descriptors on challenging place recognition and image retrieval benchmarks.

摘要

我们解决了大规模视觉场所识别的问题,该任务是快速准确地识别给定查询照片的位置。我们提出了以下四个主要贡献。首先,我们开发了一种卷积神经网络(CNN)架构,可以直接针对场所识别任务进行端到端训练。该架构的主要组件 NetVLAD 是一种新的广义 VLAD 层,它受到了在图像检索中常用的“局部聚集描述符向量”图像表示的启发。该层可轻松插入任何 CNN 架构中,并可通过反向传播进行训练。其次,我们创建了一个新的弱监督排序损失,该损失可以从从 Google Street View Time Machine 下载的随时间描绘相同地点的图像中,对架构的参数进行端到端学习。第三,我们开发了一种有效的训练过程,可以应用于非常大规模的弱标签任务。最后,我们表明,在所提出的架构和训练过程在具有挑战性的场所识别和图像检索基准上,明显优于非学习的图像表示和现成的 CNN 描述符。

相似文献

1
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition.NetVLAD:用于弱监督场景识别的卷积神经网络架构。
IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1437-1451. doi: 10.1109/TPAMI.2017.2711011. Epub 2017 Jun 1.
2
Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition.用于地点识别的带加权三元组损失的空间金字塔增强NetVLAD
IEEE Trans Neural Netw Learn Syst. 2020 Feb;31(2):661-674. doi: 10.1109/TNNLS.2019.2908982. Epub 2019 Apr 26.
3
Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition.弱监督补丁网络:用于场景识别的局部补丁描述与聚合
IEEE Trans Image Process. 2017 Apr;26(4):2028-2041. doi: 10.1109/TIP.2017.2666739. Epub 2017 Feb 9.
4
Cross-Modal Retrieval With CNN Visual Features: A New Baseline.基于卷积神经网络视觉特征的跨模态检索:一个新的基线。
IEEE Trans Cybern. 2017 Feb;47(2):449-460. doi: 10.1109/TCYB.2016.2519449. Epub 2016 Mar 8.
5
Deep FisherNet for Image Classification.深度 Fisher 网络图像分类
IEEE Trans Neural Netw Learn Syst. 2019 Jul;30(7):2244-2250. doi: 10.1109/TNNLS.2018.2874657. Epub 2018 Nov 5.
6
M-SAC-VLADNet: A Multi-Path Deep Feature Coding Model for Visual Classification.M-SAC-VLADNet:一种用于视觉分类的多路径深度特征编码模型。
Entropy (Basel). 2018 May 4;20(5):341. doi: 10.3390/e20050341.
7
Cross-Convolutional-Layer Pooling for Image Recognition.跨卷积层池化的图像识别。
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2305-2313. doi: 10.1109/TPAMI.2016.2637921. Epub 2016 Dec 9.
8
Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors.通过稳健的局部描述符聚合来改进大规模图像检索。
IEEE Trans Pattern Anal Mach Intell. 2017 Sep;39(9):1783-1796. doi: 10.1109/TPAMI.2016.2613873. Epub 2016 Sep 27.
9
Robust Face Recognition Using the Deep C2D-CNN Model Based on Decision-Level Fusion.基于决策级融合的深度 C2D-CNN 模型的鲁棒人脸识别
Sensors (Basel). 2018 Jun 28;18(7):2080. doi: 10.3390/s18072080.
10
SurfNetv2: An Improved Real-Time SurfNet and Its Applications to Defect Recognition of Calcium Silicate Boards.SurfNetv2:一种改进的实时 SurfNet 及其在硅酸钙板缺陷识别中的应用。
Sensors (Basel). 2020 Aug 5;20(16):4356. doi: 10.3390/s20164356.

引用本文的文献

1
TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology.TOSD:一种集成形状、颜色和拓扑结构的分层对象中心描述符。
Sensors (Basel). 2025 Jul 25;25(15):4614. doi: 10.3390/s25154614.
2
Unified Depth-Guided Feature Fusion and Reranking for Hierarchical Place Recognition.用于分层地点识别的统一深度引导特征融合与重排序
Sensors (Basel). 2025 Jun 29;25(13):4056. doi: 10.3390/s25134056.
3
Visual Place Recognition Based on Dynamic Difference and Dual-Path Feature Enhancement.基于动态差异和双路径特征增强的视觉场所识别
Sensors (Basel). 2025 Jun 25;25(13):3947. doi: 10.3390/s25133947.
4
Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.大海捞针:用于从大规模视频集合中检索时刻的粗到细对齐网络。
PLoS One. 2025 May 15;20(5):e0320661. doi: 10.1371/journal.pone.0320661. eCollection 2025.
5
Temporal-Spatial Redundancy Reduction in Video Sequences: A Motion-Based Entropy-Driven Attention Approach.视频序列中的时空冗余减少:一种基于运动的熵驱动注意力方法。
Biomimetics (Basel). 2025 Mar 21;10(4):192. doi: 10.3390/biomimetics10040192.
6
A Hybrid Deep Learning and Improved SVM Framework for Real-Time Railroad Construction Personnel Detection with Multi-Scale Feature Optimization.一种用于实时铁路建设人员检测的混合深度学习与改进支持向量机框架及多尺度特征优化
Sensors (Basel). 2025 Mar 26;25(7):2061. doi: 10.3390/s25072061.
7
A visual SLAM loop closure detection method based on lightweight siamese capsule network.一种基于轻量级暹罗胶囊网络的视觉同步定位与地图构建回环检测方法。
Sci Rep. 2025 Mar 4;15(1):7644. doi: 10.1038/s41598-025-90511-4.
8
LoCS-Net: Localizing convolutional spiking neural network for fast visual place recognition.LoCS-Net:用于快速视觉场所识别的局部卷积脉冲神经网络
Front Neurorobot. 2025 Jan 29;18:1490267. doi: 10.3389/fnbot.2024.1490267. eCollection 2024.
9
Ligand identification in CryoEM and X-ray maps using deep learning.利用深度学习在冷冻电镜和X射线图谱中进行配体识别。
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae749.
10
Action recognition using attention-based spatio-temporal VLAD networks and adaptive video sequences optimization.基于注意力的时空VLAD网络与自适应视频序列优化的动作识别
Sci Rep. 2024 Oct 31;14(1):26202. doi: 10.1038/s41598-024-75640-6.