Duan Yue, Zhao Zhen, Qi Lei, Wang Lei, Zhou Luping, Shi Yinghuan, Gao Yang
IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):8441-8455. doi: 10.1109/TNNLS.2022.3228380. Epub 2024 Jun 3.
The core issue in semi-supervised learning (SSL) lies in how to effectively leverage unlabeled data, whereas most existing methods tend to put a great emphasis on the utilization of high-confidence samples yet seldom fully explore the usage of low-confidence samples. In this article, we aim to utilize low-confidence samples in a novel way with our proposed mutex-based consistency regularization, namely MutexMatch. Specifically, the high-confidence samples are required to exactly predict "what it is" by the conventional true-positive classifier (TPC), while low-confidence samples are employed to achieve a simpler goal-to predict with ease "what it is not" by the true-negative classifier (TNC). In this sense, we not only mitigate the pseudo-labeling errors but also make full use of the low-confidence unlabeled data by the consistency of dissimilarity degree. MutexMatch achieves superior performance on multiple benchmark datasets, i.e., Canadian Institute for Advanced Research (CIFAR)-10, CIFAR-100, street view house numbers (SVHN), self-taught learning 10 (STL-10), and mini-ImageNet. More importantly, our method further shows superiority when the amount of labeled data is scarce, e.g., 92.23% accuracy with only 20 labeled data on CIFAR-10. Code has been released at https://github.com/NJUyued/MutexMatch4SSL.
半监督学习(SSL)的核心问题在于如何有效利用未标记数据,而大多数现有方法往往非常强调高置信度样本的利用,却很少充分探索低置信度样本的用途。在本文中,我们旨在通过提出的基于互斥的一致性正则化方法,即互斥匹配(MutexMatch),以一种新颖的方式利用低置信度样本。具体而言,传统的真阳性分类器(TPC)要求高置信度样本准确预测“它是什么”,而低置信度样本则用于实现一个更简单的目标——由真阴性分类器(TNC)轻松预测“它不是什么”。从这个意义上说,我们不仅减轻了伪标签错误,还通过差异度的一致性充分利用了低置信度的未标记数据。互斥匹配在多个基准数据集上取得了优异的性能,即加拿大高级研究所(CIFAR)-10、CIFAR-100、街景门牌号(SVHN)、自学学习10(STL-10)和迷你ImageNet。更重要的是,当标记数据量稀缺时,我们的方法进一步显示出优势,例如在CIFAR-10上仅使用20个标记数据时准确率达到92.23%。代码已在https://github.com/NJUyued/MutexMatch4SSL上发布。