Chen Liangming, Jin Long, Shang Mingsheng
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12535-12549. doi: 10.1109/TNNLS.2024.3462516.
Theoretical and empirical evidence highlights a positive correlation between the flatness of loss landscapes around minima and generalization. However, most current approaches that seek to find flat minima either incur high computational costs or struggle to balance generalization, training stability, and convergence. This work proposes reshaping the loss landscape to induce the optimizer toward flat regions, an approach that has negligible computational costs and does not compromise training stability, convergence, or efficiency. We focus on nonlinear, loss-dependent reshaping functions underpinned by theoretical insights to reshape the loss landscape. To design these functions, we first identify where and how these functions should be applied. With the aid of recently developed tools in stochastic optimization, theoretical analysis shows that steepening the low-loss landscape improves the rate of sharp minimum escape while flattening the high- and ultralow-loss landscapes enhances training stability and optimization performance, respectively. Simulations and experiments reveal that the subtly designed reshaping functions not only induce optimizers to find flat minima and improve generalization performance but also stabilize training, promote optimization, and keep efficiency. Our approach is evaluated on image classification, adversarial robustness, and natural language processing (NLP) tasks and achieves significant improvement in generalization performance with negligible computational cost. We believe that the new perspective introduced in this work will broadly impact the field of deep neural network training. The code is available at https://github.com/LongJin-lab/LLR.
理论和实证证据表明,最小值附近损失景观的平坦度与泛化之间存在正相关。然而,目前大多数旨在寻找平坦最小值的方法要么计算成本高昂,要么难以平衡泛化、训练稳定性和收敛性。这项工作提出重塑损失景观,引导优化器走向平坦区域,这种方法计算成本可忽略不计,且不影响训练稳定性、收敛性或效率。我们专注于基于理论见解的非线性、依赖损失的重塑函数来重塑损失景观。为了设计这些函数,我们首先确定这些函数应在何处以及如何应用。借助随机优化中最近开发的工具,理论分析表明,使低损失景观变陡可提高逃离尖锐最小值的速率,而使高损失和超低损失景观变平分别可增强训练稳定性和优化性能。模拟和实验表明,精心设计的重塑函数不仅能引导优化器找到平坦最小值并提高泛化性能,还能稳定训练、促进优化并保持效率。我们的方法在图像分类、对抗鲁棒性和自然语言处理(NLP)任务上进行了评估,并在计算成本可忽略不计的情况下显著提高了泛化性能。我们相信,这项工作中引入的新视角将对深度神经网络训练领域产生广泛影响。代码可在https://github.com/LongJin-lab/LLR获取。