Wang Haobo, Xiao Ruixuan, Li Yixuan, Feng Lei, Niu Gang, Chen Gang, Zhao Junbo
IEEE Trans Pattern Anal Mach Intell. 2024 May;46(5):3183-3198. doi: 10.1109/TPAMI.2023.3342650. Epub 2024 Apr 3.
Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set with the ground-truth label included. However, in a more practical but challenging scenario, the annotator may miss the ground-truth and provide a wrong candidate set, which is known as the noisy PLL problem. To remedy this problem, we propose the PiCO+ framework that simultaneously disambiguates the candidate sets and mitigates label noise. Core to PiCO+, we develop a novel label disambiguation algorithm PiCO that consists of a contrastive learning module along with a novel class prototype-based disambiguation method. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. To handle label noise, we extend PiCO to PiCO+, which further performs distance-based clean sample selection, and learns robust classifiers by a semi-supervised contrastive learning algorithm. Beyond this, we further investigate the robustness of PiCO+ in the context of out-of-distribution noise and incorporate a novel energy-based rejection method for improved robustness. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.
部分标签学习(PLL)是一个重要问题,它允许每个训练示例用一个包含真实标签的粗略候选集进行标注。然而,在更实际但具有挑战性的场景中,标注者可能会遗漏真实标签并提供错误的候选集,这就是所谓的有噪声PLL问题。为了解决这个问题,我们提出了PiCO+框架,该框架同时消除候选集的歧义并减轻标签噪声。PiCO+的核心是,我们开发了一种新颖的标签消歧算法PiCO,它由一个对比学习模块以及一种基于类原型的新颖消歧方法组成。从理论上讲,我们表明这两个组件是互利的,并且可以从期望最大化(EM)算法的角度进行严格论证。为了处理标签噪声,我们将PiCO扩展到PiCO+,它进一步执行基于距离的干净样本选择,并通过半监督对比学习算法学习鲁棒的分类器。除此之外,我们进一步研究了PiCO+在分布外噪声情况下的鲁棒性,并纳入了一种新颖的基于能量的拒绝方法以提高鲁棒性。大量实验表明,我们提出的方法在标准和有噪声的PLL任务中显著优于当前的最先进方法,甚至取得了与完全监督学习相当的结果。