Suppr超能文献

用于加速收敛和提高泛化能力的锐度感知前瞻

Sharpness-Aware Lookahead for Accelerating Convergence and Improving Generalization.

作者信息

Tan Chengli, Zhang Jiangshe, Liu Junmin, Gong Yihong

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10375-10388. doi: 10.1109/TPAMI.2024.3444002. Epub 2024 Nov 6.

Abstract

Lookahead is a popular stochastic optimizer that can accelerate the training process of deep neural networks. However, the solutions found by Lookahead often generalize worse than those found by its base optimizers, such as SGD and Adam. To address this issue, we propose Sharpness-Aware Lookahead (SALA), a novel optimizer that aims to identify flat minima that generalize well. SALA divides the training process into two stages. In the first stage, the direction towards flat regions is determined by leveraging a quadratic approximation of the optimization trajectory, without incurring any extra computational overhead. In the second stage, however, it is determined by Sharpness-Aware Minimization (SAM), which is particularly effective in improving generalization at the terminal phase of training. In contrast to Lookahead, SALA retains the benefits of accelerated convergence while also enjoying superior generalization performance compared to the base optimizer. Theoretical analysis of the expected excess risk, as well as empirical results on canonical neural network architectures and datasets, demonstrate the advantages of SALA over Lookahead. It is noteworthy that with approximately 25% more computational overhead than the base optimizer, SALA can achieve the same generalization performance as SAM which requires twice the training budget of the base optimizer.

摘要

前瞻算法是一种流行的随机优化器,它可以加速深度神经网络的训练过程。然而,前瞻算法找到的解决方案通常比其基础优化器(如随机梯度下降和自适应矩估计)找到的解决方案泛化能力更差。为了解决这个问题,我们提出了锐度感知前瞻算法(SALA),这是一种新颖的优化器,旨在识别泛化能力良好的平坦最小值。SALA将训练过程分为两个阶段。在第一阶段,通过利用优化轨迹的二次近似来确定朝向平坦区域的方向,而不会产生任何额外的计算开销。然而,在第二阶段,它由锐度感知最小化(SAM)确定,这在提高训练末期的泛化能力方面特别有效。与前瞻算法相比,SALA在保持加速收敛优势的同时,与基础优化器相比还具有卓越的泛化性能。对预期超额风险的理论分析以及在典型神经网络架构和数据集上的实证结果,证明了SALA优于前瞻算法。值得注意的是,虽然SALA的计算开销比基础优化器大约多25%,但它可以实现与SAM相同的泛化性能,而SAM需要的训练预算是基础优化器的两倍。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验