IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8036-8048. doi: 10.1109/TPAMI.2022.3228755. Epub 2023 Jun 5.
Partially labeled data learning (PLDL), including partial label learning (PLL) and partial multi-label learning (PML), has been widely used in nowadays data science. Researchers attempt to construct different specific models to deal with the different classification tasks for PLL and PML scenarios respectively. The main challenge in training classifiers for PLL and PML is how to deal with ambiguities caused by the noisy false-positive labels in the candidate label set. The state-of-the-art strategy for both scenarios is to perform disambiguation by identifying the ground-truth label(s) directly from the candidate label set, which can be summarized into two categories: 'the identifying method' and 'the embedding method'. However, both kinds of methods are constructed by hand-designed heuristic modeling under considerations like feature/label correlations with no theoretical interpretation. Instead of adopting heuristic or specific modeling, we propose a novel unifying framework called A Unifying Probabilistic Framework for Partially Labeled Data Learning (UPF-PLDL), which is derived from a clear probabilistic formulation, and brings existing research on PLL and PML under one theoretical interpretation with respect to information theory. Furthermore, the proposed UPF-PLDL also unifies 'the identifying method' and 'the embedding method' into one integrated framework, which naturally incorporates the feature and label correlation considerations. Comprehensive experiments on synthetic and real-world datasets for both PLL and PML scenarios clearly demonstrate the superiorities of the derived framework.
部分标记数据学习(PLDL),包括部分标签学习(PLL)和部分多标签学习(PML),已在当今的数据科学中得到广泛应用。研究人员尝试构建不同的特定模型,分别用于 PLL 和 PML 场景的不同分类任务。在 PLL 和 PML 场景中训练分类器的主要挑战是如何处理候选标签集中由嘈杂的假阳性标签引起的歧义。这两种情况的最新策略是通过直接从候选标签集中识别真实标签来进行去歧义,这可以总结为两类:“识别方法”和“嵌入方法”。然而,这两种方法都是基于特征/标签相关性等考虑因素,通过手工设计的启发式建模来构建的,没有理论解释。我们没有采用启发式或特定的建模方法,而是提出了一个称为 A 统一概率框架的新的统一框架,用于部分标记数据学习(UPF-PLDL),该框架源于清晰的概率公式,并将 PLL 和 PML 的现有研究纳入一个理论解释中,与信息论有关。此外,所提出的 UPF-PLDL 还将“识别方法”和“嵌入方法”统一到一个集成框架中,自然地考虑了特征和标签相关性。在 PLL 和 PML 场景的合成和真实数据集上进行的综合实验清楚地证明了该框架的优越性。