Baldassi Carlo, Borgs Christian, Chayes Jennifer T, Ingrosso Alessandro, Lucibello Carlo, Saglietti Luca, Zecchina Riccardo
Department of Applied Science and Technology, Politecnico di Torino, I-10129 Torino, Italy;
Human Genetics Foundation-Torino, I-10126 Torino, Italy.
Proc Natl Acad Sci U S A. 2016 Nov 29;113(48):E7655-E7662. doi: 10.1073/pnas.1608103113. Epub 2016 Nov 15.
In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare-but extremely dense and accessible-regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.
在人工神经网络中,从数据中学习是一项计算量很大的任务,其中大量连接权重通过基于随机梯度的启发式过程在代价函数上进行迭代调整。目前人们对这些系统中的学习过程如何发生,特别是它们如何避免陷入计算性能较差的配置,还了解得不够透彻。在这里,我们研究具有离散权重的网络这种困难情况,即使对于简单架构,其优化态势也非常崎岖,并且我们提供了理论和数值证据,证明在网络权重空间中存在罕见但极其密集且易于访问的配置区域。我们定义了一种度量,即稳健系综(RE),它抑制孤立配置的陷阱并放大这些密集区域的作用。我们在一些可精确求解的模型中解析计算了RE,还提供了一种易于实现的通用算法方案:定义一个由原始代价函数的有限个副本之和给出的代价函数,并带有一个将副本围绕驱动赋值进行中心化的约束。为了说明这一点,我们推导了几种强大的算法,从马尔可夫链到消息传递再到梯度下降过程,这些算法针对稳健的密集状态,从而在性能上有显著提升。对权重精度位数的弱依赖性使我们推测,非常相似的推理适用于更传统的神经网络。类似的算法方案也可以应用于其他优化问题。