Suppr超能文献

如何信任未标记的数据?小样本学习中的实例可信度推断。

How to Trust Unlabeled Data? Instance Credibility Inference for Few-Shot Learning.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6240-6253. doi: 10.1109/TPAMI.2021.3086140. Epub 2022 Sep 14.

Abstract

Deep learning based models have excelled in many computer vision tasks and appear to surpass humans' performance. However, these models require an avalanche of expensive human labeled training data and many iterations to train their large number of parameters. This severely limits their scalability to the real-world long-tail distributed categories, some of which are with a large number of instances, but with only a few manually annotated. Learning from such extremely limited labeled examples is known as Few-Shot Learning (FSL). Different to prior arts that leverage meta-learning or data augmentation strategies to alleviate this extremely data-scarce problem, this paper presents a statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the support of unlabeled instances for few-shot visual recognition. Typically, we repurpose the self-taught learning paradigm to predict pseudo-labels of unlabeled instances with an initial classifier trained from the few shot and then select the most confident ones to augment the training set to re-train the classifier. This is achieved by constructing a (Generalized) Linear Model (LM/GLM) with incidental parameters to model the mapping from (un-)labeled features to their (pseudo-)labels, in which the sparsity of the incidental parameters indicates the credibility of the corresponding pseudo-labeled instance. We rank the credibility of pseudo-labeled instances along the regularization path of their corresponding incidental parameters, and the most trustworthy pseudo-labeled examples are preserved as the augmented labeled instances. This process is repeated until all the unlabeled samples are included in the expanded training set. Theoretically, under the conditions of restricted eigenvalue, irrepresentability, and large error, our approach is guaranteed to collect all the correctly-predicted pseudo-labeled instances from the noisy pseudo-labeled set. Extensive experiments under two few-shot settings show the effectiveness of our approach on four widely used few-shot visual recognition benchmark datasets including miniImageNet, tieredImageNet, CIFAR-FS, and CUB. Code and models are released at https://github.com/Yikai-Wang/ICI-FSL.

摘要

基于深度学习的模型在许多计算机视觉任务中表现出色,似乎超过了人类的表现。然而,这些模型需要大量昂贵的人工标记训练数据和许多迭代来训练它们的大量参数。这严重限制了它们在真实世界长尾分布类别中的可扩展性,其中一些类别实例数量很多,但只有少数经过人工注释。从如此极其有限的标记示例中学习被称为小样本学习(FSL)。与利用元学习或数据增强策略来缓解这一极其缺乏数据的问题的先前技术不同,本文提出了一种统计方法,称为实例可信度推断(ICI),以利用未标记实例的支持来进行小样本视觉识别。通常,我们重新利用自监督学习范式来预测未标记实例的伪标签,方法是使用从少量样本中训练的初始分类器对其进行预测,然后选择最有信心的实例来扩充训练集,以重新训练分类器。这是通过构建一个带有偶然参数的(广义)线性模型(LM/GLM)来实现的,该模型用于建立从(未)标记特征到其(伪)标签的映射,其中偶然参数的稀疏性表示相应伪标记实例的可信度。我们沿着其相应偶然参数的正则化路径对伪标记实例的可信度进行排名,并将最可信的伪标记实例保留为扩充的标记实例。这个过程会一直重复,直到所有未标记的样本都包含在扩展的训练集中。在约束特征值、不可表示性和大误差的条件下,我们的方法理论上可以保证从嘈杂的伪标记集中收集所有正确预测的伪标记实例。在两种小样本设置下进行的广泛实验表明,我们的方法在包括 miniImageNet、tieredImageNet、CIFAR-FS 和 CUB 在内的四个广泛使用的小样本视觉识别基准数据集上是有效的。代码和模型可在 https://github.com/Yikai-Wang/ICI-FSL 上获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验