IEEE Trans Neural Netw Learn Syst. 2020 Sep;31(9):3594-3605. doi: 10.1109/TNNLS.2019.2945113. Epub 2019 Nov 5.
Despite their prevalence in neural networks, we still lack a thorough theoretical characterization of rectified linear unit (ReLU) layers. This article aims to further our understanding of ReLU layers by studying how the activation function ReLU interacts with the linear component of the layer and what role this interaction plays in the success of the neural network in achieving its intended task. To this end, we introduce two new tools: ReLU singular values of operators and the Gaussian mean width of operators. By presenting, on the one hand, theoretical justifications, results, and interpretations of these two concepts and, on the other hand, numerical experiments and results of the ReLU singular values and the Gaussian mean width being applied to trained neural networks, we hope to give a comprehensive, singular-value-centric view of ReLU layers. We find that ReLU singular values and the Gaussian mean width do not only enable theoretical insights but also provide one with metrics that seem promising for practical applications. In particular, these measures can be used to distinguish correctly and incorrectly classified data as it traverses the network. We conclude by introducing two tools based on our findings: double layers and harmonic pruning.
尽管在神经网络中很常见,但我们仍然缺乏对修正线性单元 (ReLU) 层的全面理论描述。本文旨在通过研究 ReLU 激活函数如何与层的线性部分相互作用,以及这种相互作用在神经网络成功完成其预期任务中的作用,进一步了解 ReLU 层。为此,我们引入了两个新工具:算子的 ReLU 奇异值和算子的高斯平均宽度。一方面,我们提出了这两个概念的理论依据、结果和解释,另一方面,我们进行了数值实验,并将 ReLU 奇异值和高斯平均宽度应用于训练好的神经网络,希望能从奇异值的角度全面、深入地了解 ReLU 层。我们发现,ReLU 奇异值和高斯平均宽度不仅可以提供理论上的深入理解,还为实际应用提供了有前途的指标。特别是,这些度量可以用于区分网络中正确和错误分类的数据。最后,我们基于研究结果引入了两个工具:双层和调和剪枝。