Suppr超能文献

BPT-PLR:一种用于噪声标签学习的具有伪标签松弛对比损失的平衡划分与训练框架。

BPT-PLR: A Balanced Partitioning and Training Framework with Pseudo-Label Relaxed Contrastive Loss for Noisy Label Learning.

作者信息

Zhang Qian, Jin Ge, Zhu Yi, Wei Hongjian, Chen Qiu

机构信息

School of Information Technology, Jiangsu Open University, Nanjing 210036, China.

School of Communication & Information Engineering, Shanghai University, Shanghai 200444, China.

出版信息

Entropy (Basel). 2024 Jul 10;26(7):589. doi: 10.3390/e26070589.

Abstract

While collecting training data, even with the manual verification of experts from crowdsourcing platforms, eliminating incorrect annotations (noisy labels) completely is difficult and expensive. In dealing with datasets that contain noisy labels, over-parameterized deep neural networks (DNNs) tend to overfit, leading to poor generalization and classification performance. As a result, noisy label learning (NLL) has received significant attention in recent years. Existing research shows that although DNNs eventually fit all training data, they first prioritize fitting clean samples, then gradually overfit to noisy samples. Mainstream methods utilize this characteristic to divide training data but face two issues: class imbalance in the segmented data subsets and the optimization conflict between unsupervised contrastive representation learning and supervised learning. To address these issues, we propose a Balanced Partitioning and Training framework with Pseudo-Label Relaxed contrastive loss called BPT-PLR, which includes two crucial processes: a balanced partitioning process with a two-dimensional Gaussian mixture model (BP-GMM) and a semi-supervised oversampling training process with a pseudo-label relaxed contrastive loss (SSO-PLR). The former utilizes both semantic feature information and model prediction results to identify noisy labels, introducing a balancing strategy to maintain class balance in the divided subsets as much as possible. The latter adopts the latest pseudo-label relaxed contrastive loss to replace unsupervised contrastive loss, reducing optimization conflicts between semi-supervised and unsupervised contrastive losses to improve performance. We validate the effectiveness of BPT-PLR on four benchmark datasets in the NLL field: CIFAR-10/100, Animal-10N, and Clothing1M. Extensive experiments comparing with state-of-the-art methods demonstrate that BPT-PLR can achieve optimal or near-optimal performance.

摘要

在收集训练数据时,即使通过众包平台的专家进行人工验证,要完全消除错误标注(噪声标签)也是困难且成本高昂的。在处理包含噪声标签的数据集时,参数过多的深度神经网络(DNN)容易出现过拟合,导致泛化能力和分类性能不佳。因此,噪声标签学习(NLL)近年来受到了广泛关注。现有研究表明,尽管DNN最终会拟合所有训练数据,但它们首先会优先拟合干净样本,然后逐渐过拟合到噪声样本。主流方法利用这一特性对训练数据进行划分,但面临两个问题:分割后的数据子集中存在类别不平衡,以及无监督对比表示学习和监督学习之间的优化冲突。为了解决这些问题,我们提出了一种带有伪标签松弛对比损失的平衡划分与训练框架,称为BPT-PLR,它包括两个关键过程:使用二维高斯混合模型的平衡划分过程(BP-GMM)和使用伪标签松弛对比损失的半监督过采样训练过程(SSO-PLR)。前者利用语义特征信息和模型预测结果来识别噪声标签,并引入一种平衡策略,尽可能在划分后的子集中保持类别平衡。后者采用最新的伪标签松弛对比损失来取代无监督对比损失,减少半监督和无监督对比损失之间的优化冲突,以提高性能。我们在NLL领域的四个基准数据集上验证了BPT-PLR的有效性:CIFAR-10/100、Animal-10N和Clothing1M。与现有最先进方法的大量实验比较表明,BPT-PLR可以实现最优或接近最优的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93c3/11275369/29171cbeb572/entropy-26-00589-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验