Huang Bin, Xie Ying, Xu Chaoyang
School of Business, Putian University, Putian, 351100, China.
School of Mechanical, Electrical, and Information Engineering, Putian University, Putian, 351100, China.
Sci Rep. 2025 Jan 8;15(1):1350. doi: 10.1038/s41598-025-85679-8.
Noise label learning has attracted considerable attention owing to its ability to leverage large amounts of inexpensive and imprecise data. Sharpness aware minimization (SAM) has shown effective improvements in the generalization performance in the presence of noisy labels by introducing adversarial weight perturbations in the model parameter space. However, our experimental observations have shown that the SAM generalization bottleneck primarily stems from the difficulty of finding the correct adversarial perturbation amidst the noisy data. To address this problem, a theoretical analysis of the mismatch in the direction of the parameter perturbation between noise and clean samples during the training process was conducted. Based on these analyses, a clean aware sharpness aware minimization algorithm known as CA-SAM is proposed. CA-SAM dynamically divides the training data into possible likely clean and noisy datasets based on the historical model output and uses likely clean samples to determine the direction of the parameter perturbation. By searching for flat minima in the loss landscape, the objective was to restrict the gradient perturbation direction of noisy samples to align them while preserving the clean samples. By conducting comprehensive experiments and scrutinizing benchmark datasets containing diverse noise patterns and levels, it is demonstrated that our CA-SAM outperforms certain innovative approaches by a substantial margin.
噪声标签学习因其能够利用大量廉价且不精确的数据而备受关注。锐度感知最小化(SAM)通过在模型参数空间中引入对抗性权重扰动,在存在噪声标签的情况下,其泛化性能有了显著提升。然而,我们的实验观察表明,SAM的泛化瓶颈主要源于在噪声数据中找到正确的对抗性扰动的困难。为了解决这个问题,我们对训练过程中噪声样本和干净样本之间参数扰动方向的不匹配进行了理论分析。基于这些分析,我们提出了一种名为CA - SAM的干净感知锐度感知最小化算法。CA - SAM根据历史模型输出动态地将训练数据划分为可能的干净和噪声数据集,并使用可能的干净样本确定参数扰动的方向。通过在损失景观中寻找平坦最小值,目标是限制噪声样本的梯度扰动方向,使其对齐,同时保留干净样本。通过进行全面的实验并仔细研究包含不同噪声模式和水平的基准数据集,结果表明我们的CA - SAM显著优于某些创新方法。