Winkler Ludwig, Ojeda César, Opper Manfred
Machine Learning Group, Technische Universität Berlin, 10623 Berlin, Germany.
Artificial Intelligence Group, Technische Universität Berlin, 10623 Berlin, Germany.
Entropy (Basel). 2022 Aug 9;24(8):1097. doi: 10.3390/e24081097.
In this paper, we propose to leverage the Bayesian uncertainty information encoded in parameter distributions to inform the learning procedure for Bayesian models. We derive a first principle stochastic differential equation for the training dynamics of the mean and uncertainty parameter in the variational distributions. On the basis of the derived Bayesian stochastic differential equation, we apply the methodology of stochastic optimal control on the variational parameters to obtain individually controlled learning rates. We show that the resulting optimizer, StochControlSGD, is significantly more robust to large learning rates and can adaptively and individually control the learning rates of the variational parameters. The evolution of the control suggests separate and distinct dynamical behaviours in the training regimes for the mean and uncertainty parameters in Bayesian neural networks.
在本文中,我们提议利用参数分布中编码的贝叶斯不确定性信息来为贝叶斯模型的学习过程提供信息。我们推导了变分分布中均值和不确定性参数训练动态的第一原理随机微分方程。基于推导得到的贝叶斯随机微分方程,我们将随机最优控制方法应用于变分参数,以获得个体控制的学习率。我们表明,由此产生的优化器StochControlSGD对大学习率具有显著更强的鲁棒性,并且可以自适应地、个体地控制变分参数的学习率。控制的演变表明,贝叶斯神经网络中均值和不确定性参数的训练模式存在分离且不同的动态行为。