Lu Yunan, Li Weiwei, Li Huaxiong, Jia Xiuyi
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15364-15379. doi: 10.1109/TPAMI.2023.3300310. Epub 2023 Nov 3.
Label distribution offers more information about label polysemy than logical label. There are presently two approaches to obtaining label distributions: LDL (label distribution learning) and LE (label enhancement). In LDL, experts must annotate training instances with label distributions, and a predictive function is trained on this training set to obtain label distributions. In LE, experts must annotate instances with logical labels, and label distributions are recovered from them. However, LDL is limited by expensive annotations, and LE has no performance guarantee. Therefore, we investigate how to predict label distribution from TMLR (tie-allowed multi-label ranking) which is a compromise on annotation cost but has good performance guarantees. On the one hand, we theoretically dissect the relationship between TMLR and label distribution. We define EAE (expected approximation error) to quantify the quality of an annotation, provide EAE bounds for TMLR, and derive the optimal range of label distributions corresponding to a given TMLR annotation. On the other hand, we propose a framework for predicting label distribution from TMLR via conditional Dirichlet mixtures. This framework blends the procedures of recovering and learning label distributions end-to-end and allows us to effortlessly encode our knowledge by a semi-adaptive scoring function. Extensive experiments validate our proposal.
标签分布比逻辑标签提供了更多关于标签多义性的信息。目前有两种获取标签分布的方法:标签分布学习(LDL)和标签增强(LE)。在LDL中,专家必须用标签分布对训练实例进行标注,并在这个训练集上训练一个预测函数以获得标签分布。在LE中,专家必须用逻辑标签对实例进行标注,并从这些标注中恢复标签分布。然而,LDL受到昂贵标注的限制,而LE没有性能保证。因此,我们研究如何从允许平局的多标签排序(TMLR)中预测标签分布,TMLR在标注成本上是一种折衷,但具有良好的性能保证。一方面,我们从理论上剖析了TMLR与标签分布之间的关系。我们定义期望近似误差(EAE)来量化标注的质量,为TMLR提供EAE边界,并推导与给定TMLR标注相对应的标签分布的最优范围。另一方面,我们提出了一个通过条件狄利克雷混合从TMLR预测标签分布的框架。这个框架将端到端恢复和学习标签分布的过程融合在一起,并允许我们通过一个半自适应评分函数轻松地编码我们的知识。大量实验验证了我们的提议。