Zhang Qian, Zhu Yi, Yang Ming, Jin Ge, Zhu Yingwen, Lu Yanjun, Zou Yu, Chen Qiu
School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China.
School of Computer and Electronic Information, Nanjing Normal University, Nanjing, Jiangsu, China.
PLoS One. 2024 Dec 5;19(12):e0309841. doi: 10.1371/journal.pone.0309841. eCollection 2024.
Deep neural networks have powerful memory capabilities, yet they frequently suffer from overfitting to noisy labels, leading to a decline in classification and generalization performance. To address this issue, sample selection methods that filter out potentially clean labels have been proposed. However, there is a significant gap in size between the filtered, possibly clean subset and the unlabeled subset, which becomes particularly pronounced at high-noise rates. Consequently, this results in underutilizing label-free samples in sample selection methods, leaving room for performance improvement. This study introduces an enhanced sample selection framework with an oversampling strategy (SOS) to overcome this limitation. This framework leverages the valuable information contained in label-free instances to enhance model performance by combining an SOS with state-of-the-art sample selection methods. We validate the effectiveness of SOS through extensive experiments conducted on both synthetic noisy datasets and real-world datasets such as CIFAR, WebVision, and Clothing1M. The source code for SOS will be made available at https://github.com/LanXiaoPang613/SOS.
深度神经网络具有强大的记忆能力,但它们经常因过度拟合噪声标签而受到影响,导致分类和泛化性能下降。为了解决这个问题,已经提出了过滤掉潜在干净标签的样本选择方法。然而,经过过滤的、可能干净的子集与未标记子集之间在规模上存在显著差距,在高噪声率情况下这种差距尤为明显。因此,这导致在样本选择方法中未充分利用无标签样本,仍有性能提升的空间。本研究引入了一种带有过采样策略(SOS)的增强样本选择框架来克服这一限制。该框架通过将SOS与最先进的样本选择方法相结合,利用无标签实例中包含的有价值信息来提高模型性能。我们通过在合成噪声数据集以及CIFAR、WebVision和Clothing1M等真实世界数据集上进行的大量实验,验证了SOS的有效性。SOS的源代码将在https://github.com/LanXiaoPang613/SOS上提供。