Posch Konstantin, Pilz Juergen
IEEE Trans Neural Netw Learn Syst. 2021 Mar;32(3):1037-1051. doi: 10.1109/TNNLS.2020.2980004. Epub 2021 Mar 1.
In this article, a novel approach for training deep neural networks using Bayesian techniques is presented. The Bayesian methodology allows for an easy evaluation of model uncertainty and, additionally, is robust to overfitting. These are commonly the two main problems classical, i.e., non-Bayesian architectures have to struggle with. The proposed approach applies variational inference in order to approximate the intractable posterior distribution. In particular, the variational distribution is defined as the product of multiple multivariate normal distributions with tridiagonal covariance matrices. Every single normal distribution belongs either to the weights or to the biases corresponding to one network layer. The layerwise a posteriori variances are defined based on the corresponding expectation values, and furthermore, the correlations are assumed to be identical. Therefore, only a few additional parameters need to be optimized compared with non-Bayesian settings. The performance of the new approach is evaluated and compared with other recently developed Bayesian methods. Basis of the performance evaluations are the popular benchmark data sets MNIST and CIFAR-10. Among the considered approaches, the proposed one shows the best predictive accuracy. Moreover, extensive evaluations of the provided prediction uncertainty information indicate that the new approach often yields more useful uncertainty estimates than the comparison methods.
本文提出了一种使用贝叶斯技术训练深度神经网络的新方法。贝叶斯方法便于评估模型的不确定性,此外,对过拟合具有鲁棒性。这通常是传统的(即非贝叶斯)架构必须应对的两个主要问题。所提出的方法应用变分推理来近似难以处理的后验分布。具体而言,变分分布被定义为多个具有三对角协方差矩阵的多元正态分布的乘积。每个单独的正态分布要么属于权重,要么属于与一个网络层对应的偏差。基于相应的期望值定义逐层后验方差,此外,假设相关性是相同的。因此,与非贝叶斯设置相比,只需要优化几个额外的参数。评估了新方法的性能,并与其他最近开发的贝叶斯方法进行了比较。性能评估的基础是流行的基准数据集MNIST和CIFAR-10。在所考虑的方法中,所提出的方法显示出最佳的预测准确性。此外,对所提供的预测不确定性信息的广泛评估表明,新方法通常比比较方法产生更有用的不确定性估计。