Suppr超能文献

基于范数保持的视觉识别改进残差网络。

Improved Residual Network based on norm-preservation for visual recognition.

作者信息

Mahaur Bharat, Mishra K K, Singh Navjot

机构信息

Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Allahabad, UP, India.

Department of Information Technology, Indian Institute of Information Technology Allahabad, Allahabad, UP, India.

出版信息

Neural Netw. 2023 Jan;157:305-322. doi: 10.1016/j.neunet.2022.10.023. Epub 2022 Oct 28.

Abstract

Residual Network (ResNet) achieves deeper and wider networks with high-performance gains, representing a powerful convolutional neural network architecture. In this paper, we propose architectural refinements to ResNet that address the information flow through several layers of the network, including the input stem, downsampling block, projection shortcut, and identity blocks. We will show that our collective refinements facilitate stable backpropagation by preserving the norm of the error gradient within the residual blocks, which can reduce the optimization difficulties of training very deep networks. Our proposed modifications enhance the learning dynamics, resulting in high accuracy and inference performance by enforcing norm-preservation throughout the network training. The effectiveness of our method is verified by extensive experimental results on five computer vision tasks, including image classification (ImageNet and CIFAR-100), video classification (Kinetics-400), multi-label image recognition (MS-COCO), object detection and semantic segmentation (PASCAL VOC). We also empirically show consistent improvements in generalization performance when applying our modifications over different networks to provide new insights and inspire new architectures. The source code is publicly available at: https://github.com/bharatmahaur/LeNo.

摘要

残差网络(ResNet)通过高性能提升实现了更深更宽的网络,代表了一种强大的卷积神经网络架构。在本文中,我们对ResNet提出了架构优化,以解决网络中若干层的信息流问题,包括输入主干、下采样块、投影捷径和恒等块。我们将表明,我们的综合优化通过在残差块内保持误差梯度的范数来促进稳定的反向传播,这可以减少训练极深网络的优化困难。我们提出的修改增强了学习动态,通过在整个网络训练中强制保持范数,从而实现高精度和推理性能。我们的方法的有效性在五个计算机视觉任务的大量实验结果中得到了验证,包括图像分类(ImageNet和CIFAR-100)、视频分类(Kinetics-400)、多标签图像识别(MS-COCO)、目标检测和语义分割(PASCAL VOC)。我们还通过实证表明,在不同网络上应用我们的修改时,泛化性能会持续提高,从而提供新的见解并启发新的架构。源代码可在以下网址公开获取:https://github.com/bharatmahaur/LeNo

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验