Department of Computer Science and software engineering, Laval University, Pavillon Adrien-Pouliot 1065, av. de la Médecine, Quebec, G1V0A6, Quebec, Canada.
Neural Netw. 2023 Jul;164:382-394. doi: 10.1016/j.neunet.2023.04.028. Epub 2023 Apr 25.
We prove new generalization bounds for stochastic gradient descent when training classifiers with invariances. Our analysis is based on the stability framework and covers both the convex case of linear classifiers and the non-convex case of homogeneous neural networks. We analyze stability with respect to the normalized version of the loss function used for training. This leads to investigating a form of angle-wise stability instead of euclidean stability in weights. For neural networks, the measure of distance we consider is invariant to rescaling the weights of each layer. Furthermore, we exploit the notion of on-average stability in order to obtain a data-dependent quantity in the bound. This data-dependent quantity is seen to be more favorable when training with larger learning rates in our numerical experiments. This might help to shed some light on why larger learning rates can lead to better generalization in some practical scenarios.
我们证明了在具有不变性的分类器训练中,随机梯度下降的新的推广界限。我们的分析基于稳定性框架,涵盖了线性分类器的凸情况和同调神经网络的非凸情况。我们根据用于训练的损失函数的归一化版本分析稳定性。这导致在权重方面研究角度稳定性而不是欧几里得稳定性。对于神经网络,我们考虑的距离度量对于每层权重的缩放是不变的。此外,我们利用平均稳定性的概念来获得界中的数据相关量。在我们的数值实验中,当使用更大的学习率进行训练时,这个数据相关量更有利。这可能有助于解释为什么在某些实际情况下,更大的学习率可以导致更好的泛化。