Suppr超能文献

SIRe-Networks:通过跳传/残差连接和交错自编码器实现信息保留的卷积神经网络架构扩展。

SIRe-Networks: Convolutional neural networks architectural extension for information preservation via skip/residual connections and interlaced auto-encoders.

机构信息

Department of Computer Science, Sapienza University, Via Salaria 113, Rome, 00138, Italy.

Department of Mathematics, Computer Science and Physics, Università di Udine, Via delle Scienze 20, Udine, 33100, Italy.

出版信息

Neural Netw. 2022 Sep;153:386-398. doi: 10.1016/j.neunet.2022.06.030. Epub 2022 Jun 27.

Abstract

Improving existing neural network architectures can involve several design choices such as manipulating the loss functions, employing a diverse learning strategy, exploiting gradient evolution at training time, optimizing the network hyper-parameters, or increasing the architecture depth. The latter approach is a straightforward solution, since it directly enhances the representation capabilities of a network; however, the increased depth generally incurs in the well-known vanishing gradient problem. In this paper, borrowing from different methods addressing this issue, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by preserving information from the input image through interlaced auto-encoders (AEs), and further refines the base network architecture by means of skip and residual connections. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on five collections, i.e., MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and Caltech-256; where the SIRe-extended architectures achieve significantly increased performances across all models and datasets, thus confirming the presented approach effectiveness.

摘要

改进现有的神经网络架构可能涉及多个设计选择,例如操纵损失函数、采用多样化的学习策略、利用训练时的梯度演化、优化网络超参数或增加架构深度。后一种方法是一种直接的解决方案,因为它直接增强了网络的表示能力;然而,增加深度通常会导致众所周知的梯度消失问题。在本文中,我们借鉴了不同的方法来解决这个问题,引入了一种交错的多任务学习策略,称为 SIRe,以减少与目标分类任务相关的梯度消失。所提出的方法通过交错自动编码器 (AE) 从输入图像中保留信息,直接改进卷积神经网络 (CNN),并通过跳过和残差连接进一步改进基础网络架构。为了验证所提出的方法,通过 SIRe 策略扩展了一个简单的 CNN 和一些著名网络的实现,并在五个集合上进行了广泛的测试,即 MNIST、Fashion-MNIST、CIFAR-10、CIFAR-100 和 Caltech-256;其中,SIRe 扩展的架构在所有模型和数据集上都显著提高了性能,从而证实了所提出方法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验