Jiang Shenwang, Li Jianan, Zhang Jizhou, Wang Ying, Xu Tingfa
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14420-14434. doi: 10.1109/TPAMI.2023.3311636. Epub 2023 Nov 3.
Label noise and class imbalance are common challenges encountered in real-world datasets. Existing approaches for robust learning often focus on addressing either label noise or class imbalance individually, resulting in suboptimal performance when both biases are present. To bridge this gap, this work introduces a novel meta-learning-based dynamic loss that adapts the objective functions during the training process to effectively learn a classifier from long-tailed noisy data. Specifically, our dynamic loss consists of two components: a label corrector and a margin generator. The label corrector is responsible for correcting noisy labels, while the margin generator generates per-class classification margins by capturing the underlying data distribution and the learning state of the classifier. In addition, we employ a hierarchical sampling strategy that enriches a small amount of unbiased metadata with diverse and challenging samples. This enables the joint optimization of the two components in the dynamic loss through meta-learning, allowing the classifier to effectively adapt to clean and balanced test data. Extensive experiments conducted on multiple real-world and synthetic datasets with various types of data biases, including CIFAR-10/100, Animal-10N, ImageNet-LT, and Webvision, demonstrate that our method achieves state-of-the-art accuracy.
标签噪声和类别不平衡是现实世界数据集中常见的挑战。现有的鲁棒学习方法通常分别侧重于解决标签噪声或类别不平衡问题,当两种偏差同时存在时,会导致性能次优。为了弥补这一差距,本文提出了一种基于元学习的新型动态损失,在训练过程中调整目标函数,以便从长尾噪声数据中有效地学习分类器。具体而言,我们的动态损失由两个部分组成:标签校正器和边界生成器。标签校正器负责校正噪声标签,而边界生成器通过捕捉基础数据分布和分类器的学习状态来生成每个类别的分类边界。此外,我们采用分层采样策略,用多样且具有挑战性的样本丰富少量无偏差的元数据。这使得通过元学习对动态损失中的两个部分进行联合优化,从而使分类器能够有效地适应干净且平衡的测试数据。在多个具有各种数据偏差的真实世界和合成数据集(包括CIFAR-10/100、Animal-10N、ImageNet-LT和Webvision)上进行的大量实验表明,我们的方法达到了当前最优的准确率。