Dong Xingping, Ouyang Tianran, Liao Shengcai, Du Bo, Shao Ling
IEEE Trans Image Process. 2024;33:5663-5675. doi: 10.1109/TIP.2024.3461472. Epub 2024 Oct 9.
Most existing few-shot learning (FSL) methods require a large amount of labeled data in meta-training, which is a major limit. To reduce the requirement of labels, a semi-supervised meta-training (SSMT) setting has been proposed for FSL, which includes only a few labeled samples and numbers of unlabeled samples in base classes. However, existing methods under this setting require class-aware sample selection from the unlabeled set, which violates the assumption of unlabeled set. In this paper, we propose a practical semi-supervised meta-training setting with truly unlabeled data to facilitate the applications of FSL in realistic scenarios. To better utilize both the labeled and truly unlabeled data, we propose a simple and effective meta-training framework, called pseudo-labeling based meta-learning (PLML). Firstly, we train a classifier via common semi-supervised learning (SSL) and use it to obtain the pseudo-labels of unlabeled data. Then we build few-shot tasks from labeled and pseudo-labeled data and design a novel finetuning method with feature smoothing and noise suppression to better learn the FSL model from noise labels. Surprisingly, through extensive experiments across two FSL datasets, we find that this simple meta-training framework effectively prevents the performance degradation of various FSL models under limited labeled data, and also significantly outperforms the representative SSMT models. Besides, benefiting from meta-training, our method also improves several representative SSL algorithms as well. We provide the training code and usage examples at https://github.com/ouyangtianran/PLML.
大多数现有的少样本学习(FSL)方法在元训练中需要大量的标记数据,这是一个主要限制。为了减少对标签的需求,针对FSL提出了一种半监督元训练(SSMT)设置,该设置在基类中仅包括少量标记样本和大量未标记样本。然而,此设置下的现有方法需要从未标记集中进行类感知样本选择,这违反了未标记集假设。在本文中,我们提出了一种具有真正未标记数据的实用半监督元训练设置,以促进FSL在现实场景中的应用。为了更好地利用标记数据和真正的未标记数据,我们提出了一个简单有效的元训练框架,称为基于伪标签的元学习(PLML)。首先,我们通过常见的半监督学习(SSL)训练一个分类器,并用它来获得未标记数据的伪标签。然后,我们从标记数据和伪标记数据构建少样本任务,并设计一种具有特征平滑和噪声抑制的新颖微调方法,以便从噪声标签中更好地学习FSL模型。令人惊讶的是,通过在两个FSL数据集上进行的广泛实验,我们发现这个简单的元训练框架有效地防止了各种FSL模型在有限标记数据下的性能下降,并且还显著优于代表性的SSMT模型。此外,受益于元训练,我们的方法还改进了几种代表性的SSL算法。我们在https://github.com/ouyangtianran/PLML上提供了训练代码和使用示例。