Suppr超能文献

用于正例和无标注学习的大间隔标签校准支持向量机

Large-Margin Label-Calibrated Support Vector Machines for Positive and Unlabeled Learning.

作者信息

Gong Chen, Liu Tongliang, Yang Jian, Tao Dacheng

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3471-3483. doi: 10.1109/TNNLS.2019.2892403. Epub 2019 Feb 6.

Abstract

Positive and unlabeled learning (PU learning) aims to train a binary classifier based on only PU data. Existing methods usually cast PU learning as a label noise learning problem or a cost-sensitive learning problem. However, none of them fully take the data distribution information into consideration when designing the model, which hinders them from acquiring more encouraging performance. In this paper, we argue that the clusters formed by positive examples and potential negative examples in the feature space should be critically utilized to establish the PU learning model, especially when the negative data are not explicitly available. To this end, we introduce a hat loss to discover the margin between data clusters, a label calibration regularizer to amend the biased decision boundary to the potentially correct one, and propose a novel discriminative PU classifier termed "Large-margin Label-calibrated Support Vector Machines" (LLSVM). Our LLSVM classifier can work properly in the absence of negative training examples and effectively achieve the max-margin effect between positive and negative classes. Theoretically, we derived the generalization error bound of LLSVM which reveals that the introduction of PU data does help to enhance the algorithm performance. Empirically, we compared LLSVM with state-of-the-art PU methods on various synthetic and practical data sets, and the results confirm that the proposed LLSVM is more effective than other compared methods on dealing with PU learning tasks.

摘要

正例与无标记学习(PU学习)旨在仅基于PU数据训练一个二分类器。现有方法通常将PU学习转化为标签噪声学习问题或代价敏感学习问题。然而,它们在设计模型时均未充分考虑数据分布信息,这阻碍了它们获得更令人满意的性能。在本文中,我们认为应批判性地利用特征空间中正例和潜在负例形成的簇来建立PU学习模型,尤其是在负数据未明确可用时。为此,我们引入一种帽损失来发现数据簇之间的间隔,一种标签校准正则化器将有偏差的决策边界修正为潜在正确的边界,并提出了一种名为“大间隔标签校准支持向量机”(LLSVM)的新型判别式PU分类器。我们的LLSVM分类器能够在没有负训练样本的情况下正常工作,并有效地实现正类和负类之间的最大间隔效应。从理论上讲,我们推导了LLSVM的泛化误差界,这表明引入PU数据确实有助于提高算法性能。从实证角度,我们在各种合成数据集和实际数据集上,将LLSVM与现有最先进的PU方法进行了比较,结果证实所提出的LLSVM在处理PU学习任务方面比其他比较方法更有效。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验