Sum John, Leung Chi-Sing, Ho Kevin
IEEE Trans Neural Netw Learn Syst. 2020 Jun;31(6):2227-2232. doi: 10.1109/TNNLS.2019.2927689. Epub 2019 Aug 6.
Over decades, gradient descent has been applied to develop learning algorithm to train a neural network (NN). In this brief, a limitation of applying such algorithm to train an NN with persistent weight noise is revealed. Let V(w) be the performance measure of an ideal NN. V(w) is applied to develop the gradient descent learning (GDL). With weight noise, the desired performance measure (denoted as J(w) ) is E[V(~w)|w] , where ~w is the noisy weight vector. Applying GDL to train an NN with weight noise, the actual learning objective is clearly not V(w) but another scalar function L(w) . For decades, there is a misconception that L(w) = J(w) , and hence, the actual model attained by the GDL is the desired model. However, we show that it might not: 1) with persistent additive weight noise, the actual model attained is the desired model as L(w) = J(w) ; and 2) with persistent multiplicative weight noise, the actual model attained is unlikely the desired model as L(w) ≠ J(w) . Accordingly, the properties of the models attained as compared with the desired models are analyzed and the learning curves are sketched. Simulation results on 1) a simple regression problem and 2) the MNIST handwritten digit recognition are presented to support our claims.
几十年来,梯度下降已被用于开发学习算法来训练神经网络(NN)。在本简报中,揭示了将这种算法应用于训练具有持续权重噪声的神经网络的一个局限性。设V(w)为理想神经网络的性能度量。V(w)被用于开发梯度下降学习(GDL)。对于存在权重噪声的情况,期望的性能度量(记为J(w))是E[V(w)|w],其中w是有噪声的权重向量。将GDL应用于训练具有权重噪声的神经网络时,实际的学习目标显然不是V(w),而是另一个标量函数L(w)。几十年来,存在一种误解,认为L(w)=J(w),因此,通过GDL获得的实际模型就是期望的模型。然而,我们表明可能并非如此:1)对于持续的加性权重噪声,获得的实际模型是期望的模型,因为L(w)=J(w);2)对于持续的乘性权重噪声,获得的实际模型不太可能是期望的模型,因为L(w)≠J(w)。相应地,分析了所获得的模型与期望模型相比的属性,并绘制了学习曲线。给出了关于1)一个简单回归问题和2)MNIST手写数字识别的仿真结果来支持我们的观点。