Zhang Jingfeng, Song Bo, Wang Haohan, Han Bo, Liu Tongliang, Liu Lei, Sugiyama Masashi
IEEE Trans Pattern Anal Mach Intell. 2024 Jun;46(6):4398-4409. doi: 10.1109/TPAMI.2024.3355425. Epub 2024 May 7.
Label-noise learning (LNL) aims to increase the model's generalization given training data with noisy labels. To facilitate practical LNL algorithms, researchers have proposed different label noise types, ranging from class-conditional to instance-dependent noises. In this paper, we introduce a novel label noise type called BadLabel, which can significantly degrade the performance of existing LNL algorithms by a large margin. BadLabel is crafted based on the label-flipping attack against standard classification, where specific samples are selected and their labels are flipped to other labels so that the loss values of clean and noisy labels become indistinguishable. To address the challenge posed by BadLabel, we further propose a robust LNL method that perturbs the labels in an adversarial manner at each epoch to make the loss values of clean and noisy labels again distinguishable. Once we select a small set of (mostly) clean labeled data, we can apply the techniques of semi-supervised learning to train the model accurately. Empirically, our experimental results demonstrate that existing LNL algorithms are vulnerable to the newly introduced BadLabel noise type, while our proposed robust LNL method can effectively improve the generalization performance of the model under various types of label noise. The new dataset of noisy labels and the source codes of robust LNL algorithms are available at https://github.com/zjfheart/BadLabels.
标签噪声学习(LNL)旨在在给定带有噪声标签的训练数据的情况下提高模型的泛化能力。为了促进实用的LNL算法,研究人员提出了不同类型的标签噪声,从类条件噪声到实例依赖噪声。在本文中,我们引入了一种名为BadLabel的新型标签噪声,它可以大幅显著降低现有LNL算法的性能。BadLabel是基于针对标准分类的标签翻转攻击构建的,其中选择特定样本并将其标签翻转到其他标签,以使干净标签和噪声标签的损失值变得难以区分。为了应对BadLabel带来的挑战,我们进一步提出了一种鲁棒的LNL方法,该方法在每个epoch以对抗方式扰动标签,以使干净标签和噪声标签的损失值再次可区分。一旦我们选择了一小部分(大部分)干净的带标签数据,我们就可以应用半监督学习技术来准确训练模型。从经验上看,我们的实验结果表明,现有的LNL算法容易受到新引入的BadLabel噪声类型的影响,而我们提出的鲁棒LNL方法可以在各种类型的标签噪声下有效提高模型的泛化性能。有噪声标签的新数据集和鲁棒LNL算法的源代码可在https://github.com/zjfheart/BadLabels上获取。