Suppr超能文献

一种使用拉格朗日乘数改进神经网络权重初始化的数学框架。

A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers.

作者信息

de Pater Ingeborg, Mitici Mihaela

机构信息

Faculty of Aerospace Engineering, Delft University of Technology, HS 2926 Delft, The Netherlands.

Faculty of Science, Utrecht University, Heidelberglaan 8, 3584 CS Utrecht, The Netherlands.

出版信息

Neural Netw. 2023 Sep;166:579-594. doi: 10.1016/j.neunet.2023.07.035. Epub 2023 Aug 3.

Abstract

A good weight initialization is crucial to accelerate the convergence of the weights in a neural network. However, training a neural network is still time-consuming, despite recent advances in weight initialization approaches. In this paper, we propose a mathematical framework for the weight initialization in the last layer of a neural network. We first derive analytically a tight constraint on the weights that accelerates the convergence of the weights during the back-propagation algorithm. We then use linear regression and Lagrange multipliers to analytically derive the optimal initial weights and initial bias of the last layer, that minimize the initial training loss given the derived tight constraint. We also show that the restrictive assumption of traditional weight initialization algorithms that the expected value of the weights is zero is redundant for our approach. We first apply our proposed weight initialization approach to a Convolutional Neural Network that predicts the Remaining Useful Life of aircraft engines. The initial training and validation loss are relatively small, the weights do not get stuck in a local optimum, and the convergence of the weights is accelerated. We compare our approach with several benchmark strategies. Compared to the best performing state-of-the-art initialization strategy (Kaiming initialization), our approach needs 34% less epochs to reach the same validation loss. We also apply our approach to ResNets for the CIFAR-100 dataset, combined with transfer learning. Here, the initial accuracy is already at least 53%. This gives a faster weight convergence and a higher test accuracy than the benchmark strategies.

摘要

良好的权重初始化对于加速神经网络中权重的收敛至关重要。然而,尽管权重初始化方法最近有所进展,但训练神经网络仍然耗时。在本文中,我们提出了一种用于神经网络最后一层权重初始化的数学框架。我们首先通过分析得出权重的一个严格约束,该约束在反向传播算法期间加速权重的收敛。然后,我们使用线性回归和拉格朗日乘数来分析得出最后一层的最优初始权重和初始偏差,在给定得出的严格约束下使初始训练损失最小化。我们还表明,传统权重初始化算法中权重期望值为零的限制性假设对我们的方法来说是多余的。我们首先将我们提出的权重初始化方法应用于一个预测飞机发动机剩余使用寿命的卷积神经网络。初始训练和验证损失相对较小,权重不会陷入局部最优,并且权重的收敛得到加速。我们将我们的方法与几种基准策略进行比较。与性能最佳的最新初始化策略(Kaiming初始化)相比,我们的方法达到相同验证损失所需的轮次少34%。我们还将我们的方法应用于用于CIFAR - 100数据集的ResNets,并结合迁移学习。在此,初始准确率已经至少为53%。这比基准策略给出了更快的权重收敛和更高的测试准确率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验