IEEE Trans Neural Netw Learn Syst. 2012 Nov;23(11):1827-40. doi: 10.1109/TNNLS.2012.2210243.
Injecting weight noise during training is a simple technique that has been proposed for almost two decades. However, little is known about its convergence behavior. This paper studies the convergence of two weight noise injection-based training algorithms, multiplicative weight noise injection with weight decay and additive weight noise injection with weight decay. We consider that they are applied to multilayer perceptrons either with linear or sigmoid output nodes. Let w(t) be the weight vector, let V(w) be the corresponding objective function of the training algorithm, let α >; 0 be the weight decay constant, and let μ(t) be the step size. We show that if μ(t)→ 0, then with probability one E[||w(t)||2(2)] is bound and lim(t) → ∞ ||w(t)||2 exists. Based on these two properties, we show that if μ(t)→ 0, Σtμ(t)=∞, and Σtμ(t)(2) <; ∞, then with probability one these algorithms converge. Moreover, w(t) converges with probability one to a point where ∇wV(w)=0.
在训练过程中注入权重噪声是一种简单的技术,已经提出了将近二十年。然而,人们对其收敛行为知之甚少。本文研究了两种基于权重噪声注入的训练算法的收敛性,即带有权重衰减的乘法权重噪声注入和带有权重衰减的加法权重噪声注入。我们假设它们应用于具有线性或 sigmoid 输出节点的多层感知器。令 w(t) 为权重向量,令 V(w) 为训练算法的相应目标函数,令 α>0 为权重衰减常数,令 μ(t) 为步长。我们证明,如果 μ(t)→0,则必然有 E[||w(t)||2(2)] 有界且 lim(t)→∞||w(t)||2 存在。基于这两个性质,我们证明了如果 μ(t)→0,Σtμ(t)=∞,并且 Σtμ(t)(2)<∞,则这些算法必然会收敛。此外,w(t)必然以概率收敛到一个点,在该点处 ∇wV(w)=0。