School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China; Mine Digitization Engineering Research Centre of Ministry of Education of the People's Republic of China, Xuzhou 221116, China.
Neural Netw. 2022 Apr;148:155-165. doi: 10.1016/j.neunet.2022.01.012. Epub 2022 Jan 24.
To explain the working mechanism of ResNet and its variants, this paper proposes a novel argument of shallow subnetwork first (SSF), essentially low-degree term first (LDTF), which also applies to the whole neural network family. A neural network with shortcut connections behaves as an ensemble of a number of subnetworks of differing depths. Among the subnetworks, the shallow subnetworks are trained firstly, having great effects on the performance of the neural network. The shallow subnetworks roughly correspond to low-degree polynomials, while the deep subnetworks are opposite. Based on Taylor expansion, SSF is consistent with LDTF. ResNet is in line with Taylor expansion: shallow subnetworks are trained firstly to keep low-degree terms, avoiding overfitting; deep subnetworks try to maintain high-degree terms, ensuring high description capacity. Experiments on ResNets and DenseNets show that shallow subnetworks are trained firstly and play important roles in the training of the networks. The experiments also reveal the reason why DenseNets outperform ResNets: The subnetworks playing vital roles in the training of the former are shallower than those in the training of the latter. Furthermore, LDTF can also be used to explain the working mechanism of other ResNet variants (SE-ResNets and SK-ResNets), and the common phenomena occurring in many neural networks.
为了解释 ResNet 及其变体的工作机制,本文提出了一个新的论点,即浅子网络优先(SSF),本质上是低阶项优先(LDTF),这也适用于整个神经网络家族。具有捷径连接的神经网络表现为一系列不同深度子网的集合。在这些子网中,浅层子网首先被训练,对神经网络的性能有很大的影响。浅层子网大致对应于低阶多项式,而深层子网则相反。基于泰勒展开,SSF 与 LDTF 一致。ResNet 符合泰勒展开:首先训练浅层子网以保持低阶项,避免过拟合;深层子网则试图保持高阶项,以确保高描述能力。在 ResNets 和 DenseNets 上的实验表明,浅层子网首先被训练,并在网络的训练中发挥重要作用。实验还揭示了 DenseNets 优于 ResNets 的原因:在前一种网络的训练中起关键作用的子网比在后一种网络的训练中起关键作用的子网更浅。此外,LDTF 也可用于解释其他 ResNet 变体(SE-ResNets 和 SK-ResNets)以及许多神经网络中常见的现象的工作机制。