Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran.
Neural Netw. 2020 Aug;128:33-46. doi: 10.1016/j.neunet.2020.04.021. Epub 2020 Apr 25.
Deep networks can learn complex problems, however, they suffer from overfitting. To solve this problem, regularization methods have been proposed that are not adaptable to the dynamic changes in the training process. With a different approach, this paper presents a regularization method based on the Singular Value Decomposition (SVD) that adjusts the learning model adaptively. To this end, the overfitting can be evaluated by condition numbers of the synaptic matrices. When the overfitting is high, the matrices are substituted with their SVD approximations. Some theoretical results are derived to show the performance of this regularization method. It is proved that SVD approximation cannot solve overfitting after several iterations. Thus, a new Tikhonov term is added to the loss function to converge the synaptic weights to the SVD approximation of the best-found results. Following this approach, an Adaptive SVD Regularization (ASR) is proposed to adjust the learning model with respect to the dynamic training characteristics. ASR results are visualized to show how ASR overcomes overfitting. The different configurations of Convolutional Neural Networks (CNN) are implemented with different augmentation schemes to compare ASR with state-of-the-art regularization methods. The results show that on MNIST, F-MNIST, SVHN, CIFAR-10 and CIFAR-100, the accuracies of ASR are 99.4%, 95.7%, 97.1%, 93.2% and 55.6%, respectively. Although ASR improves the overfitting and validation loss, its elapsed time is not significantly greater than the learning without regularization.
深度网络可以学习复杂的问题,但它们存在过拟合的问题。为了解决这个问题,已经提出了正则化方法,但这些方法不适应训练过程中的动态变化。本文采用一种不同的方法,提出了一种基于奇异值分解(SVD)的正则化方法,可以自适应地调整学习模型。为此,可以通过突触矩阵的条件数来评估过拟合程度。当过拟合程度较高时,将矩阵替换为其 SVD 逼近。本文推导出了一些理论结果来展示这种正则化方法的性能。证明了 SVD 逼近在经过几次迭代后不能解决过拟合问题。因此,在损失函数中添加了一个新的 Tikhonov 项,以使突触权重收敛到最佳结果的 SVD 逼近。基于这种方法,提出了一种自适应 SVD 正则化(ASR),以根据动态训练特性调整学习模型。展示了 ASR 如何克服过拟合的可视化结果。实现了不同配置的卷积神经网络(CNN),并采用不同的增强方案来将 ASR 与最先进的正则化方法进行比较。结果表明,在 MNIST、F-MNIST、SVHN、CIFAR-10 和 CIFAR-100 数据集上,ASR 的准确率分别为 99.4%、95.7%、97.1%、93.2%和 55.6%。虽然 ASR 改善了过拟合和验证损失,但它的耗时并不比没有正则化的学习显著增加。