具有修正线性单元激活函数的多层神经网络解的几何性质和容量。

Properties of the Geometry of Solutions and Capacity of Multilayer Neural Networks with Rectified Linear Unit Activations.

机构信息

Artificial Intelligence Lab, Institute for Data Science and Analytics, Bocconi University, Milano 20135, Italy.

出版信息

Phys Rev Lett. 2019 Oct 25;123(17):170602. doi: 10.1103/PhysRevLett.123.170602.

DOI:10.1103/PhysRevLett.123.170602

Abstract

Rectified linear units (ReLUs) have become the main model for the neural units in current deep learning systems. This choice was originally suggested as a way to compensate for the so-called vanishing gradient problem which can undercut stochastic gradient descent learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: While the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings.

摘要

修正线性单元（ReLU）已成为当前深度学习系统中神经单元的主要模型。这种选择最初是作为一种补偿所谓的梯度消失问题的方法提出的，该问题可能会削弱由多层组成的网络中的随机梯度下降学习。在这里，我们提供了关于 ReLU 对具有二进制或实值权重的两层神经网络的容量和解决方案空间几何景观的影响的分析结果。我们研究了存储大量随机模式的问题，并且出乎意料地发现，随着隐藏层中神经元数量的增加，网络的容量仍然是有限的，这与阈值单元的情况不同，在阈值单元中，容量发散。可能更重要的是，大偏差方法使我们能够发现解决方案空间的几何景观具有奇特的结构：虽然大多数解决方案在距离上接近但仍然是孤立的，但存在很少的解决方案区域，这些区域比阈值单元的类似区域密集得多。这些解决方案对权重的扰动具有鲁棒性，并且可以容忍输入的大扰动。分析结果得到了数值结果的证实。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

具有修正线性单元激活函数的多层神经网络解的几何性质和容量。

Properties of the Geometry of Solutions and Capacity of Multilayer Neural Networks with Rectified Linear Unit Activations.

机构信息

出版信息

相似文献

引用本文的文献

具有修正线性单元激活函数的多层神经网络解的几何性质和容量。

Properties of the Geometry of Solutions and Capacity of Multilayer Neural Networks with Rectified Linear Unit Activations.

机构信息

出版信息

相似文献

引用本文的文献