Department of Electrical and Computer Engineering, Signal and Image Processing Institute, University of Southern California, Los Angeles, CA 90089-2564, USA.
Department of Electrical and Computer Engineering, Signal and Image Processing Institute, University of Southern California, Los Angeles, CA 90089-2564, USA.
Neural Netw. 2019 Dec;120:9-31. doi: 10.1016/j.neunet.2019.09.016. Epub 2019 Oct 17.
Bidirectional backpropagation trains a neural network with backpropagation in both the backward and forward directions using the same synaptic weights. Special injected noise can then improve the algorithm's training time and accuracy because backpropagation has a likelihood structure. Training in each direction is a form of generalized expectation-maximization because backpropagation itself is a form of generalized expectation-maximization. This requires backpropagation invariance in each direction: The gradient log-likelihood in each direction must give back the original update equations of the backpropagation algorithm. The special noise makes the current training signal more probable as bidirectional backpropagation climbs the nearest hill of joint probability or log-likelihood. The noise for injection differs for classification and regression even in the same network because of the constraint of backpropagation invariance. The backward pass in a bidirectionally trained classifier estimates the centroid of the input pattern class. So the feedback signal that arrives back at the input layer of a classifier tends to estimate the local pattern-class centroid. Simulations show that noise speeded convergence and improved the accuracy of bidirectional backpropagation on both the MNIST test set of hand-written digits and the CIFAR-10 test set of images. The noise boost further applies to regular and Wasserstein bidirectionally trained adversarial networks. Bidirectionality also greatly reduced the problem of mode collapse in regular adversarial networks.
双向反向传播使用相同的突触权重,在反向和正向两个方向上进行反向传播训练神经网络。特殊的注入噪声可以提高算法的训练时间和准确性,因为反向传播具有似然结构。在每个方向上的训练都是广义期望最大化的一种形式,因为反向传播本身就是广义期望最大化的一种形式。这需要在每个方向上保持反向传播不变性:每个方向的梯度对数似然必须返回反向传播算法的原始更新方程。特殊的噪声使得当前的训练信号在双向反向传播中更有可能沿着联合概率或对数似然的最近的山丘攀升。由于反向传播不变性的约束,即使在同一个网络中,用于分类和回归的注入噪声也不同。在双向训练的分类器中,后向传递估计输入模式类的质心。因此,反馈信号到达分类器的输入层,往往会估计局部模式类的质心。模拟表明,噪声加速了收敛,并提高了 MNIST 手写数字测试集和 CIFAR-10 图像测试集上双向反向传播的准确性。噪声提升进一步适用于正则和 Wasserstein 双向训练的对抗网络。双向性还极大地减少了正则对抗网络中模式崩溃的问题。