IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):3980-3990. doi: 10.1109/TPAMI.2020.2990339. Epub 2021 Oct 1.
Augmenting neural networks with skip connections, as introduced in the so-called ResNet architecture, surprised the community by enabling the training of networks of more than 1,000 layers with significant performance gains. This paper deciphers ResNet by analyzing the effect of skip connections, and puts forward new theoretical results on the advantages of identity skip connections in neural networks. We prove that the skip connections in the residual blocks facilitate preserving the norm of the gradient, and lead to stable back-propagation, which is desirable from optimization perspective. We also show that, perhaps surprisingly, as more residual blocks are stacked, the norm-preservation of the network is enhanced. Our theoretical arguments are supported by extensive empirical evidence. Can we push for extra norm-preservation? We answer this question by proposing an efficient method to regularize the singular values of the convolution operator and making the ResNet's transition layers extra norm-preserving. Our numerical investigations demonstrate that the learning dynamics and the classification performance of ResNet can be improved by making it even more norm preserving. Our results and the introduced modification for ResNet, referred to as Procrustes ResNets, can be used as a guide for training deeper networks and can also inspire new deeper architectures.
残差网络(ResNet)通过引入跳跃连接,使得训练超过 1000 层的网络成为可能,并取得了显著的性能提升,这一突破令研究界感到惊讶。本文通过分析跳跃连接的作用来解析 ResNet,并提出了关于神经网络中恒等跳跃连接优势的新理论结果。我们证明了残差块中的跳跃连接有助于保持梯度的范数,从而实现稳定的反向传播,这从优化的角度来看是理想的。我们还表明,令人惊讶的是,随着堆叠的残差块数量的增加,网络的范数保持能力会增强。我们的理论论证得到了广泛的实证证据的支持。我们能否进一步推动范数保持呢?通过提出一种有效的正则化卷积算子奇异值的方法,使 ResNet 的转换层具有额外的范数保持能力,我们回答了这个问题。我们的数值研究表明,通过使 ResNet 具有更强的范数保持能力,可以改善其学习动态和分类性能。我们的结果和引入的 ResNet 改进方法(称为 Procrustes ResNets)可以作为训练更深网络的指导,并启发新的更深层次的架构。