基于图的自训练在半监督深度相似性学习中的应用。

Graph-Based Self-Training for Semi-Supervised Deep Similarity Learning.

机构信息

Department of Computer Science and Technology, School of Information, Xiamen University, Xiamen 361005, China.

Shenzhen Research Institute, Xiamen University, Shenzhen 518000, China.

出版信息

Sensors (Basel). 2023 Apr 13;23(8):3944. doi: 10.3390/s23083944.

DOI:10.3390/s23083944

PMID:37112285

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10145307/

Abstract

Semi-supervised learning is a learning pattern that can utilize labeled data and unlabeled data to train deep neural networks. In semi-supervised learning methods, self-training-based methods do not depend on a data augmentation strategy and have better generalization ability. However, their performance is limited by the accuracy of predicted pseudo-labels. In this paper, we propose to reduce the noise in the pseudo-labels from two aspects: the accuracy of predictions and the confidence of the predictions. For the first aspect, we propose a similarity graph structure learning (SGSL) model that considers the correlation between unlabeled and labeled samples, which facilitates the learning of more discriminative features and, thus, obtains more accurate predictions. For the second aspect, we propose an uncertainty-based graph convolutional network (UGCN), which can aggregate similar features based on the learned graph structure in the training phase, making the features more discriminative. It can also output the uncertainty of predictions in the pseudo-label generation phase, generating pseudo-labels only for unlabeled samples with low uncertainty; thus, reducing the noise in the pseudo-labels. Further, a positive and negative self-training framework is proposed, which combines the proposed SGSL model and UGCN into the self-training framework for end-to-end training. In addition, in order to introduce more supervised signals in the self-training process, negative pseudo-labels are generated for unlabeled samples with low prediction confidence, and then the positive and negative pseudo-labeled samples are trained together with a small number of labeled samples to improve the performance of semi-supervised learning. The code is available upon request.

摘要

半监督学习是一种可以利用有标签数据和无标签数据来训练深度神经网络的学习模式。在半监督学习方法中，基于自训练的方法不依赖于数据增强策略，具有更好的泛化能力。然而，它们的性能受到预测伪标签准确性的限制。在本文中，我们提出从两个方面减少伪标签中的噪声：预测的准确性和预测的置信度。在第一个方面，我们提出了一种相似图结构学习（SGSL）模型，该模型考虑了未标记和标记样本之间的相关性，有助于学习更具判别性的特征，从而获得更准确的预测。在第二个方面，我们提出了一种基于不确定性的图卷积网络（UGCN），它可以根据训练阶段中学习到的图结构对相似特征进行聚合，使特征更具判别性。它还可以在伪标签生成阶段输出预测的不确定性，仅为不确定性低的未标记样本生成伪标签，从而减少伪标签中的噪声。此外，提出了一种正负自训练框架，将提出的 SGSL 模型和 UGCN 结合到自训练框架中进行端到端训练。此外，为了在自训练过程中引入更多的监督信号，为置信度低的未标记样本生成负伪标签，然后将正、负伪标签样本与少量标记样本一起训练，以提高半监督学习的性能。代码可根据需要提供。