Suppr超能文献

基于置信度的带有实例相关标签噪声的PU学习

Confidence-Based PU Learning With Instance-Dependent Label Noise.

作者信息

Tang Xijia, Xu Chao, Tao Hong, Ma Xiaoyu, Hou Chenping

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Aug;36(8):14283-14297. doi: 10.1109/TNNLS.2025.3549510.

Abstract

Positive and unlabeled (PU) learning, which trains binary classifiers using only PU data, has gained vast attentions in recent years. Traditional PU learning often assumes that all the positive samples are labeled accurately. Nevertheless, due to the reasons such as sample ambiguity and insufficient algorithms, label noise is almost unavoidable in this scenario. Current PU algorithms neglect the label noise issue in the positive set, which is often biased toward certain instances rather than being uniformly distributed in practical applications. We define this important but understudied problem as PU learning with instance-dependent label noise (PUIDN). To eliminate the adverse impact of IDN, we leverage confidence scores for each instance in the positive set, which establish the connection between samples and labels without any assumption on noise distribution. Then, we propose an unbiased estimator for classification risk considering both label and confidence information, which can be computed immediately from PUIDN data along with their confidence scores. Moreover, our classification framework integrates an optimization strategy of alternating iteration based on the correlation between different confidence information, thereby alleviating the additional requirement for training data. Theoretically, we derive a generalization error bound for our proposed method. Experimentally, the effectiveness of our approach is demonstrated through various types of numerical results.

摘要

正未标记(PU)学习,即仅使用PU数据训练二分类器,近年来受到了广泛关注。传统的PU学习通常假设所有正样本都被准确标记。然而,由于样本模糊性和算法不足等原因,在这种情况下标签噪声几乎不可避免。当前的PU算法忽略了正集中的标签噪声问题,在实际应用中,正集往往偏向某些实例,而不是均匀分布。我们将这个重要但研究不足的问题定义为具有实例依赖标签噪声的PU学习(PUIDN)。为了消除IDN的不利影响,我们利用正集中每个实例的置信度分数,在不假设噪声分布的情况下建立样本与标签之间的联系。然后,我们提出了一种考虑标签和置信度信息的分类风险无偏估计器,它可以根据PUIDN数据及其置信度分数立即计算出来。此外,我们的分类框架基于不同置信度信息之间的相关性,集成了交替迭代的优化策略,从而减轻了对训练数据的额外要求。从理论上讲,我们推导了所提方法的泛化误差界。通过各种数值结果验证了我们方法的有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验