Molecular Foundry, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA.
Department of Physics, University of Ottawa, Ottawa, ON, K1N 6N5, Canada.
Nat Commun. 2021 Nov 2;12(1):6317. doi: 10.1038/s41467-021-26568-2.
We show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations, for shallow and deep neural networks. Our results provide a connection between two families of neural-network training methods that are usually considered to be fundamentally different.
我们从理论上证明,在小突变的极限下,通过条件随机突变或神经进化对神经网络的权重进行训练,相当于在存在高斯白噪声的情况下对损失函数进行梯度下降。在学习过程的独立实现中进行平均,神经进化相当于对损失函数进行梯度下降。我们通过数值模拟表明,这种对应关系可以在有限的突变下,对于浅层和深层神经网络,都可以观察到。我们的结果为两种通常被认为在根本上不同的神经网络训练方法之间提供了联系。