Department of Electrical and Computer Engineering, University of Minnesota, Twin Cities, Minneapolis, 55455, MN, USA.
Department of Electrical and Computer Engineering, University of Minnesota, Twin Cities, Minneapolis, 55455, MN, USA.
Neural Netw. 2024 Jan;169:242-256. doi: 10.1016/j.neunet.2023.10.014. Epub 2023 Oct 16.
We analyze generalization performance of over-parameterized learning methods for classification, under VC-theoretical framework. Recently, practitioners in Deep Learning discovered 'double descent' phenomenon, when large networks can fit perfectly available training data, and at the same time, achieve good generalization for future (test) data. The current consensus view is that VC-theoretical results cannot account for good generalization performance of Deep Learning networks. In contrast, this paper shows that double descent can be explained by VC-theoretical concepts, such as VC-dimension and Structural Risk Minimization. We also present empirical results showing that double descent generalization curves can be accurately modeled using classical VC-generalization bounds. Proposed VC-theoretical analysis enables better understanding of generalization curves for data sets with different statistical characteristics, such as low vs high-dimensional data and noisy data. In addition, we analyze generalization performance of transfer learning using pre-trained Deep Learning networks.
我们在 VC 理论框架下分析了过参数化学习方法在分类方面的泛化性能。最近,深度学习领域的从业者发现了“双下降”现象,即大型网络可以完美地拟合可用的训练数据,同时对未来(测试)数据也能取得良好的泛化效果。目前的共识观点是,VC 理论结果无法解释深度学习网络的良好泛化性能。相比之下,本文表明,双下降现象可以用 VC 理论的概念来解释,如 VC 维数和结构风险最小化。我们还提出了实证结果,表明使用经典的 VC 泛化界可以准确地对双下降泛化曲线进行建模。所提出的 VC 理论分析可以更好地理解具有不同统计特征的数据的泛化曲线,例如低维与高维数据以及噪声数据。此外,我们还分析了使用预训练的深度学习网络的迁移学习的泛化性能。