Department of Automation, Tsinghua University, Beijing, China.
Department of Automation, Tsinghua University, Beijing, China.
Neural Netw. 2020 May;125:205-213. doi: 10.1016/j.neunet.2020.02.010. Epub 2020 Feb 26.
Deep neural networks (DNNs) have been very successful for supervised learning. However, their high generalization performance often comes with the high cost of annotating data manually. Collecting low-quality labeled dataset is relatively cheap, e.g., using web search engines, while DNNs tend to overfit to corrupted labels easily. In this paper, we propose a collaborative learning (co-learning) approach to improve the robustness and generalization performance of DNNs on datasets with corrupted labels. This is achieved by designing a deep network with two separate branches, coupled with a relabeling mechanism. Co-learning could safely recover the true labels of most mislabeled samples, not only preventing the model from overfitting the noise, but also exploiting useful information from all the samples. Although being very simple, the proposed algorithm is able to achieve high generalization performance even a large portion of the labels are corrupted. Experiments show that co-learning consistently outperforms existing state-of-the-art methods on three widely used benchmark datasets.
深度神经网络(DNNs)在监督学习中取得了巨大的成功。然而,它们高度的泛化性能往往伴随着手动标注数据的高成本。收集低质量的标注数据集相对较便宜,例如使用网络搜索引擎,而 DNN 很容易对损坏的标签过度拟合。在本文中,我们提出了一种协同学习(co-learning)方法,以提高 DNN 在带有损坏标签的数据集上的鲁棒性和泛化性能。这是通过设计一个带有两个独立分支的深度网络,并结合一个重新标记机制来实现的。协同学习可以安全地恢复大多数误标记样本的真实标签,不仅防止模型对噪声过度拟合,还可以利用所有样本中的有用信息。虽然非常简单,但所提出的算法即使在很大一部分标签损坏的情况下,也能实现很高的泛化性能。实验表明,协同学习在三个广泛使用的基准数据集上始终优于现有的最先进方法。