Suppr超能文献

用于卷积神经网络的高效损失景观重塑

Efficient Loss Landscape Reshaping for Convolutional Neural Networks.

作者信息

Chen Liangming, Jin Long, Shang Mingsheng

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12535-12549. doi: 10.1109/TNNLS.2024.3462516.

Abstract

Theoretical and empirical evidence highlights a positive correlation between the flatness of loss landscapes around minima and generalization. However, most current approaches that seek to find flat minima either incur high computational costs or struggle to balance generalization, training stability, and convergence. This work proposes reshaping the loss landscape to induce the optimizer toward flat regions, an approach that has negligible computational costs and does not compromise training stability, convergence, or efficiency. We focus on nonlinear, loss-dependent reshaping functions underpinned by theoretical insights to reshape the loss landscape. To design these functions, we first identify where and how these functions should be applied. With the aid of recently developed tools in stochastic optimization, theoretical analysis shows that steepening the low-loss landscape improves the rate of sharp minimum escape while flattening the high- and ultralow-loss landscapes enhances training stability and optimization performance, respectively. Simulations and experiments reveal that the subtly designed reshaping functions not only induce optimizers to find flat minima and improve generalization performance but also stabilize training, promote optimization, and keep efficiency. Our approach is evaluated on image classification, adversarial robustness, and natural language processing (NLP) tasks and achieves significant improvement in generalization performance with negligible computational cost. We believe that the new perspective introduced in this work will broadly impact the field of deep neural network training. The code is available at https://github.com/LongJin-lab/LLR.

摘要

理论和实证证据表明,最小值附近损失景观的平坦度与泛化之间存在正相关。然而,目前大多数旨在寻找平坦最小值的方法要么计算成本高昂,要么难以平衡泛化、训练稳定性和收敛性。这项工作提出重塑损失景观,引导优化器走向平坦区域,这种方法计算成本可忽略不计,且不影响训练稳定性、收敛性或效率。我们专注于基于理论见解的非线性、依赖损失的重塑函数来重塑损失景观。为了设计这些函数,我们首先确定这些函数应在何处以及如何应用。借助随机优化中最近开发的工具,理论分析表明,使低损失景观变陡可提高逃离尖锐最小值的速率,而使高损失和超低损失景观变平分别可增强训练稳定性和优化性能。模拟和实验表明,精心设计的重塑函数不仅能引导优化器找到平坦最小值并提高泛化性能,还能稳定训练、促进优化并保持效率。我们的方法在图像分类、对抗鲁棒性和自然语言处理(NLP)任务上进行了评估,并在计算成本可忽略不计的情况下显著提高了泛化性能。我们相信,这项工作中引入的新视角将对深度神经网络训练领域产生广泛影响。代码可在https://github.com/LongJin-lab/LLR获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验