用于正例和无标注学习的大间隔标签校准支持向量机

Large-Margin Label-Calibrated Support Vector Machines for Positive and Unlabeled Learning.

作者信息

Gong Chen, Liu Tongliang, Yang Jian, Tao Dacheng

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3471-3483. doi: 10.1109/TNNLS.2019.2892403. Epub 2019 Feb 6.

DOI:10.1109/TNNLS.2019.2892403

Abstract

Positive and unlabeled learning (PU learning) aims to train a binary classifier based on only PU data. Existing methods usually cast PU learning as a label noise learning problem or a cost-sensitive learning problem. However, none of them fully take the data distribution information into consideration when designing the model, which hinders them from acquiring more encouraging performance. In this paper, we argue that the clusters formed by positive examples and potential negative examples in the feature space should be critically utilized to establish the PU learning model, especially when the negative data are not explicitly available. To this end, we introduce a hat loss to discover the margin between data clusters, a label calibration regularizer to amend the biased decision boundary to the potentially correct one, and propose a novel discriminative PU classifier termed "Large-margin Label-calibrated Support Vector Machines" (LLSVM). Our LLSVM classifier can work properly in the absence of negative training examples and effectively achieve the max-margin effect between positive and negative classes. Theoretically, we derived the generalization error bound of LLSVM which reveals that the introduction of PU data does help to enhance the algorithm performance. Empirically, we compared LLSVM with state-of-the-art PU methods on various synthetic and practical data sets, and the results confirm that the proposed LLSVM is more effective than other compared methods on dealing with PU learning tasks.

摘要

正例与无标记学习（PU学习）旨在仅基于PU数据训练一个二分类器。现有方法通常将PU学习转化为标签噪声学习问题或代价敏感学习问题。然而，它们在设计模型时均未充分考虑数据分布信息，这阻碍了它们获得更令人满意的性能。在本文中，我们认为应批判性地利用特征空间中正例和潜在负例形成的簇来建立PU学习模型，尤其是在负数据未明确可用时。为此，我们引入一种帽损失来发现数据簇之间的间隔，一种标签校准正则化器将有偏差的决策边界修正为潜在正确的边界，并提出了一种名为“大间隔标签校准支持向量机”（LLSVM）的新型判别式PU分类器。我们的LLSVM分类器能够在没有负训练样本的情况下正常工作，并有效地实现正类和负类之间的最大间隔效应。从理论上讲，我们推导了LLSVM的泛化误差界，这表明引入PU数据确实有助于提高算法性能。从实证角度，我们在各种合成数据集和实际数据集上，将LLSVM与现有最先进的PU方法进行了比较，结果证实所提出的LLSVM在处理PU学习任务方面比其他比较方法更有效。

相似文献

Large-Margin Label-Calibrated Support Vector Machines for Positive and Unlabeled Learning.用于正例和无标注学习的大间隔标签校准支持向量机

IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3471-3483. doi: 10.1109/TNNLS.2019.2892403. Epub 2019 Feb 6.

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning.用于正例和无标签学习的损失分解与质心估计

IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):918-932. doi: 10.1109/TPAMI.2019.2941684. Epub 2021 Feb 4.

Positive-Unlabeled Learning With Label Distribution Alignment.基于标签分布对齐的正例-无标签学习

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15345-15363. doi: 10.1109/TPAMI.2023.3319431. Epub 2023 Nov 3.

Efficient Training for Positive Unlabeled Learning.正例无标注学习的高效训练

IEEE Trans Pattern Anal Mach Intell. 2019 Nov;41(11):2584-2598. doi: 10.1109/TPAMI.2018.2860995. Epub 2018 Jul 30.

Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples.通过从正例和无标签样例中学习来有效识别化合物-蛋白质相互作用。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1832-1843. doi: 10.1109/TCBB.2016.2570211. Epub 2016 May 18.

Positive-unlabeled learning in bioinformatics and computational biology: a brief review.生物信息学和计算生物学中的正无标记学习：简要综述。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab461.

Positive-unlabeled learning for disease gene identification.基于正例无标记学习的疾病基因识别。

Bioinformatics. 2012 Oct 15;28(20):2640-7. doi: 10.1093/bioinformatics/bts504. Epub 2012 Aug 24.

Bridging the Gap Between Few-Shot and Many-Shot Learning via Distribution Calibration.通过分布校准弥合少样本学习与多样本学习之间的差距

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9830-9843. doi: 10.1109/TPAMI.2021.3132021. Epub 2022 Nov 7.

Instance-Dependent Positive and Unlabeled Learning with Labeling Bias Estimation.基于标注偏差估计的实例相关正例与未标注样本学习

IEEE Trans Pattern Anal Mach Intell. 2021 Feb 23;PP. doi: 10.1109/TPAMI.2021.3061456.

Structured max-margin learning for inter-related classifier training and multilabel image annotation.面向相关分类器训练和多标签图像标注的结构化最大间隔学习。

IEEE Trans Image Process. 2011 Mar;20(3):837-54. doi: 10.1109/TIP.2010.2073476. Epub 2010 Sep 7.

引用本文的文献

A recent survey on instance-dependent positive and unlabeled learning.一项关于实例依赖型正例和无标签学习的近期调查。

Fundam Res. 2022 Oct 12;5(2):796-803. doi: 10.1016/j.fmre.2022.09.019. eCollection 2025 Mar.

Fractional Dynamics Identification via Intelligent Unpacking of the Sample Autocovariance Function by Neural Networks.通过神经网络对样本自协方差函数进行智能解包实现分数动力学识别。

Entropy (Basel). 2020 Nov 20;22(11):1322. doi: 10.3390/e22111322.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于正例和无标注学习的大间隔标签校准支持向量机

Large-Margin Label-Calibrated Support Vector Machines for Positive and Unlabeled Learning.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献