Suppr超能文献

基于最大相关熵准则的多标签主动学习的鲁棒和判别式标注。

Robust and Discriminative Labeling for Multi-Label Active Learning Based on Maximum Correntropy Criterion.

出版信息

IEEE Trans Image Process. 2017 Apr;26(4):1694-1707. doi: 10.1109/TIP.2017.2651372. Epub 2017 Jan 10.

Abstract

Multi-label learning draws great interests in many real world applications. It is a highly costly task to assign many labels by the oracle for one instance. Meanwhile, it is also hard to build a good model without diagnosing discriminative labels. Can we reduce the label costs and improve the ability to train a good model for multi-label learning simultaneously? Active learning addresses the less training samples problem by querying the most valuable samples to achieve a better performance with little costs. In multi-label active learning, some researches have been done for querying the relevant labels with less training samples or querying all labels without diagnosing the discriminative information. They all cannot effectively handle the outlier labels for the measurement of uncertainty. Since maximum correntropy criterion (MCC) provides a robust analysis for outliers in many machine learning and data mining algorithms, in this paper, we derive a robust multi-label active learning algorithm based on an MCC by merging uncertainty and representativeness, and propose an efficient alternating optimization method to solve it. With MCC, our method can eliminate the influence of outlier labels that are not discriminative to measure the uncertainty. To make further improvement on the ability of information measurement, we merge uncertainty and representativeness with the prediction labels of unknown data. It cannot only enhance the uncertainty but also improve the similarity measurement of multi-label data with labels information. Experiments on benchmark multi-label data sets have shown a superior performance than the state-of-the-art methods.

摘要

多标签学习在许多实际应用中引起了广泛关注。对于一个实例,通过 oracle 分配多个标签是一项非常昂贵的任务。同时,如果没有诊断出有区别的标签,也很难建立一个好的模型。我们能否同时降低标签成本并提高多标签学习训练好模型的能力?主动学习通过查询最有价值的样本来解决训练样本较少的问题,以较低的成本实现更好的性能。在多标签主动学习中,已经有一些研究用于在具有较少训练样本的情况下查询相关标签,或者在没有诊断出有区别信息的情况下查询所有标签。它们都不能有效地处理异常标签对不确定性的度量。由于最大相关熵准则 (MCC) 在许多机器学习和数据挖掘算法中为异常值提供了稳健的分析,因此在本文中,我们通过合并不确定性和代表性,基于 MCC 推导出一种稳健的多标签主动学习算法,并提出了一种有效的交替优化方法来解决它。通过 MCC,我们的方法可以消除对不确定性进行度量的无区别异常标签的影响。为了进一步提高信息度量的能力,我们将不确定性和代表性与未知数据的预测标签合并。它不仅可以增强不确定性,还可以提高具有标签信息的多标签数据的相似性度量。在基准多标签数据集上的实验表明,该方法比现有方法具有更好的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验