Hou Senyu, Xu Maolong, Jiang Gaoxia, Guo Yaqing, Wang Wenjian
School of Computer and Information Technology, Shanxi University, Taiyuan, Shanxi, 030006, China.
Key Laboratory of Data Intelligence and Cognitive Computing, Shanxi University, Taiyuan, Shanxi, 030006, China.
Neural Netw. 2025 Nov;191:107827. doi: 10.1016/j.neunet.2025.107827. Epub 2025 Jul 5.
Noisy labels are ubiquitous in real-world datasets, posing substantial risks of model overfitting, especially for Deep Neural Networks (DNNs) with high parameter complexity. Sample selection, a popular method for Learning with Noisy Labels (LNL), often boosts the DNNs' performance by identifying small-loss and large-loss data as clean and noisy samples, respectively. However, the instability of loss values during iterative optimization often leads to selection errors, including both the erroneous exclusion of clean samples and the retention of noisy instances. To address these issues, we propose a novel loss function called Outlier-Trimmed Dual-Interval Smoothing (OTDIS) loss, designed to improve sample selection robustness while mitigating overfitting to label noise. OTDIS addresses loss instability through dual-interval estimation that integrates temporal dynamics and sample distributions to redefine more accurate noise levels. Specifically, we investigate how outlier losses in early training stages affect sample selection reliability. Building on this insight, we first perform temporal smoothing using outlier-trimmed confidence interval lower bounds, thereby improving temporal robustness in sample selection. Next, we implement sample-space smoothing through clustering-based regrouping to achieve distributionally stable loss estimates. Furthermore, we develop a dual-polarity training objective by incorporating negative loss as a penalty and establish two learning frameworks based on OTDIS loss, i.e., common and semi-supervised, for scenarios with different resource constraints. Experimental results demonstrate that our method significantly improves sample selection accuracy and achieves superior classification performance on both MNIST and CIFAR datasets with synthetic noise and real-world noisy datasets such as CIFAR-N, ANIMAL-10N and WebVision. Code is available at https://github.com/SenyuHou/OTDIS.
噪声标签在现实世界的数据集中无处不在,会带来模型过度拟合的重大风险,尤其是对于具有高参数复杂度的深度神经网络(DNN)。样本选择是一种流行的带噪声标签学习(LNL)方法,通常通过将小损失和大损失数据分别识别为干净样本和噪声样本,来提升DNN的性能。然而,迭代优化过程中损失值的不稳定性常常导致选择错误,包括错误地排除干净样本以及保留噪声实例。为了解决这些问题,我们提出了一种名为异常值修剪双区间平滑(OTDIS)损失的新型损失函数,旨在提高样本选择的稳健性,同时减轻对标签噪声的过度拟合。OTDIS通过双区间估计来解决损失不稳定性问题,该估计整合了时间动态和样本分布,以重新定义更准确的噪声水平。具体而言,我们研究了早期训练阶段的异常值损失如何影响样本选择的可靠性。基于这一见解,我们首先使用异常值修剪后的置信区间下限进行时间平滑,从而提高样本选择中的时间稳健性。接下来,我们通过基于聚类的重新分组实现样本空间平滑,以获得分布稳定的损失估计。此外,我们通过将负损失作为惩罚纳入,开发了一种双极性训练目标,并基于OTDIS损失建立了两个学习框架,即普通框架和半监督框架,用于不同资源约束的场景。实验结果表明,我们的方法显著提高了样本选择的准确性,并且在带有合成噪声的MNIST和CIFAR数据集以及诸如CIFAR-N、ANIMAL-10N和WebVision等真实世界噪声数据集上都取得了卓越的分类性能。代码可在https://github.com/SenyuHou/OTDIS获取。