IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):2835-2848. doi: 10.1109/TPAMI.2022.3178690. Epub 2023 Feb 3.
Label noise is ubiquitous in many real-world scenarios which often misleads training algorithm and brings about the degraded classification performance. Therefore, many approaches have been proposed to correct the loss function given corrupted labels to combat such label noise. Among them, a trend of works achieve this goal by unbiasedly estimating the data centroid, which plays an important role in constructing an unbiased risk estimator for minimization. However, they usually handle the noisy labels in different classes all at once, so the local information inherited by each class is ignored which often leads to unsatisfactory performance. To address this defect, this paper presents a novel robust learning algorithm dubbed "Class-Wise Denoising" (CWD), which tackles the noisy labels in a class-wise way to ease the entire noise correction task. Specifically, two virtual auxiliary sets are respectively constructed by presuming that the positive and negative labels in the training set are clean, so the original false-negative labels and false-positive ones are tackled separately. As a result, an improved centroid estimator can be designed which helps to yield more accurate risk estimator. Theoretically, we prove that: 1) the variance in centroid estimation can often be reduced by our CWD when compared with existing methods with unbiased centroid estimator; and 2) the performance of CWD trained on the noisy set will converge to that of the optimal classifier trained on the clean set with a convergence rate [Formula: see text] where n is the number of the training examples. These sound theoretical properties critically enable our CWD to produce the improved classification performance under label noise, which is also demonstrated by the comparisons with ten representative state-of-the-art methods on a variety of benchmark datasets.
标签噪声在许多现实场景中普遍存在,这常常会误导训练算法,并导致分类性能下降。因此,已经提出了许多方法来纠正带有错误标签的损失函数,以对抗这种标签噪声。其中,一种趋势的工作通过公正地估计数据中心来实现这一目标,这在构建最小化的无偏风险估计器中起着重要作用。然而,它们通常同时处理不同类别的有噪声标签,因此忽略了每个类继承的局部信息,这往往导致不理想的性能。为了解决这个缺陷,本文提出了一种新的鲁棒学习算法,称为“类内去噪”(CWD),它以类内的方式处理有噪声的标签,以简化整个噪声校正任务。具体来说,通过假设训练集中的正标签和负标签是干净的,分别构建两个虚拟辅助集,因此可以分别处理原始的假负标签和假正标签。结果,可以设计一个改进的质心估计器,有助于生成更准确的风险估计器。从理论上讲,我们证明了:1)与具有无偏质心估计器的现有方法相比,我们的 CWD 通常可以减少质心估计中的方差;2)在有噪声集上训练的 CWD 的性能将收敛到在干净集上训练的最优分类器的性能,收敛速度为[公式:见文本],其中 n 是训练示例的数量。这些合理的理论性质使我们的 CWD 能够在标签噪声下产生改进的分类性能,这也通过与十种代表性的最先进方法在各种基准数据集上的比较得到了证明。