Baldi Pierre, Sadowski Peter, Lu Zhiqin
Department of Computer Science, University of California, Irvine.
Department of Mathematics, University of California, Irvine.
Artif Intell. 2018 Jul;260:1-35. doi: 10.1016/j.artint.2018.03.003. Epub 2018 Apr 3.
Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the taxing requirement of maintaining symmetric weights in a physical neural system. To better understand random backpropagation, we first connect it to the notions of local learning and learning channels. Through this connection, we derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their computational complexity. We then study their behavior through simulations using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that most of these variants work robustly, almost as well as backpropagation, and that multiplication by the derivatives of the activation functions is important. As a follow-up, we study also the low-end of the number of bits required to communicate error information over the learning channel. We then provide partial intuitive explanations for some of the remarkable properties of RBP and its variations. Finally, we prove several mathematical results, including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer and convergence in special cases, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.
随机反向传播(RBP)是用于训练神经网络的反向传播算法的一种变体,在计算权重更新时,前向矩阵的转置被固定的随机矩阵所取代。它之所以引人注目,一方面是因为尽管使用随机矩阵来传递误差信息,但它仍然有效;另一方面是因为它完全消除了在物理神经网络中维持对称权重这一繁重要求。为了更好地理解随机反向传播,我们首先将其与局部学习和学习通道的概念联系起来。通过这种联系,我们推导出了几种RBP的替代方法,包括跳跃式RBP(SRPB)、自适应RBP(ARBP)、稀疏RBP及其组合(如ASRBP),并分析了它们的计算复杂度。然后,我们通过使用MNIST和CIFAR - 10基准数据集进行模拟来研究它们的行为。这些模拟表明,这些变体中的大多数都能稳健地工作,几乎与反向传播一样好,并且激活函数导数的乘法很重要。作为后续研究,我们还研究了在学习通道上传递误差信息所需的低位数情况。然后,我们对RBP及其变体的一些显著特性提供了部分直观解释。最后,我们证明了几个数学结果,包括任意长度线性链收敛到不动点、具有去相关数据的线性自动编码器收敛到不动点、具有单个隐藏层的线性系统解的长期存在性以及在特殊情况下的收敛性,以及当包含激活函数导数时非线性链收敛到不动点。