空间金字塔池化在深度卷积网络中的视觉识别。

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16. doi: 10.1109/TPAMI.2015.2389824.

DOI:10.1109/TPAMI.2015.2389824

Abstract

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 × 224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 × faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

摘要

现有的深度卷积神经网络（CNN）需要固定大小（例如 224×224）的输入图像。这种要求是“人为的”，可能会降低任意大小/比例的图像或子图像的识别精度。在这项工作中，我们为网络配备了另一种池化策略“空间金字塔池化”，以消除上述要求。新的网络结构称为 SPP-net，可以生成固定长度的表示，而与图像大小/比例无关。金字塔池化对物体变形也具有鲁棒性。具有这些优势，SPP-net 通常应该可以提高所有基于 CNN 的图像分类方法的性能。在 ImageNet 2012 数据集上，我们证明 SPP-net 可以提高各种 CNN 架构的准确性，尽管它们的设计不同。在 Pascal VOC 2007 和 Caltech101 数据集上，SPP-net 使用单个全图像表示和无需微调即可实现最先进的分类结果。SPP-net 的功能在目标检测中也非常重要。使用 SPP-net，我们只需计算一次整个图像的特征图，然后在任意区域（子图像）中进行特征池化，以生成固定长度的表示，用于训练检测器。这种方法避免了重复计算卷积特征。在处理测试图像时，我们的方法比 R-CNN 方法快 24-102 倍，同时在 Pascal VOC 2007 上实现了更好或相当的准确性。在 2014 年的大规模视觉识别挑战赛（ILSVRC）中，我们的方法在所有 38 个团队中在目标检测中排名第 2，在图像分类中排名第 3。本文档还介绍了为本次竞赛所做的改进。

相似文献

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16. doi: 10.1109/TPAMI.2015.2389824.

An Ensemble of Fine-Tuned Convolutional Neural Networks for Medical Image Classification.

IEEE J Biomed Health Inform. 2017 Jan;21(1):31-40. doi: 10.1109/JBHI.2016.2635663. Epub 2016 Dec 5.

Mitigation of Effects of Occlusion on Object Recognition with Deep Neural Networks through Low-Level Image Completion.

Comput Intell Neurosci. 2016;2016:6425257. doi: 10.1155/2016/6425257. Epub 2016 Jun 1.

Image Classification Using Biomimetic Pattern Recognition with Convolutional Neural Networks Features.

Comput Intell Neurosci. 2017;2017:3792805. doi: 10.1155/2017/3792805. Epub 2017 Feb 16.

HEp-2 Cell Image Classification With Deep Convolutional Neural Networks.

IEEE J Biomed Health Inform. 2017 Mar;21(2):416-428. doi: 10.1109/JBHI.2016.2526603. Epub 2016 Feb 8.

Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition.

IEEE Trans Cybern. 2019 May;49(5):1791-1802. doi: 10.1109/TCYB.2018.2813971. Epub 2018 Mar 22.

Fine-Tuning CNN Image Retrieval with No Human Annotation.

IEEE Trans Pattern Anal Mach Intell. 2019 Jul;41(7):1655-1668. doi: 10.1109/TPAMI.2018.2846566. Epub 2018 Jun 12.

Regionlets for Generic Object Detection.

IEEE Trans Pattern Anal Mach Intell. 2015 Oct;37(10):2071-84. doi: 10.1109/TPAMI.2015.2389830.

Endoscopic Image Classification and Retrieval using Clustered Convolutional Features.

J Med Syst. 2017 Oct 30;41(12):196. doi: 10.1007/s10916-017-0836-y.

Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets.

Comput Biol Med. 2018 Apr 1;95:217-233. doi: 10.1016/j.compbiomed.2018.02.008. Epub 2018 Feb 17.

引用本文的文献

Automated segmentation of retinal vessel using HarDNet fully convolutional networks.

PLoS One. 2025 Sep 8;20(9):e0330641. doi: 10.1371/journal.pone.0330641. eCollection 2025.

YOLO-Pika: a lightweight improved model of YOLOv8n incorporating Fusion_Block and multi-scale fusion FPN and its application in the precise detection of plateau pikas.

Front Plant Sci. 2025 Aug 20;16:1607492. doi: 10.3389/fpls.2025.1607492. eCollection 2025.

An anchor-based YOLO fruit detector developed on YOLOv5.

PLoS One. 2025 Sep 5;20(9):e0331012. doi: 10.1371/journal.pone.0331012. eCollection 2025.

An AI-based approach to create spatial inventory of safety-related architectural features for school buildings.

Dev Built Environ. 2024 Feb;17. doi: 10.1016/j.dibe.2024.100376.

A lightweight small object detection model for UAV images based on deep semantic integration.

Sci Rep. 2025 Aug 29;15(1):31888. doi: 10.1038/s41598-025-16878-6.

GhostConv+CA-YOLOv8n: a lightweight network for rice pest detection based on the aggregation of low-level features in real-world complex backgrounds.

Front Plant Sci. 2025 Aug 13;16:1620339. doi: 10.3389/fpls.2025.1620339. eCollection 2025.

CM-UNetv2: An Enhanced Semantic Segmentation Model for Precise PCB Defect Detection and Boundary Restoration.

Sensors (Basel). 2025 Aug 9;25(16):4919. doi: 10.3390/s25164919.

A Fusion of Entropy-Enhanced Image Processing and Improved YOLOv8 for Smoke Recognition in Mine Fires.

Entropy (Basel). 2025 Jul 25;27(8):791. doi: 10.3390/e27080791.

Evaluating Hemodynamic Changes in Preterm Infants Using Recent YOLO Models.

Bioengineering (Basel). 2025 Jul 29;12(8):815. doi: 10.3390/bioengineering12080815.

MSConv-YOLO: An Improved Small Target Detection Algorithm Based on YOLOv8.

J Imaging. 2025 Aug 21;11(8):285. doi: 10.3390/jimaging11080285.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

空间金字塔池化在深度卷积网络中的视觉识别。

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献