Suppr超能文献

大数据时代的小数据挑战:无监督和半监督方法的最新进展综述。

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2168-2187. doi: 10.1109/TPAMI.2020.3031898. Epub 2022 Mar 4.

Abstract

Representation learning with small labeled data have emerged in many problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training sophisticated models with few labeled data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the principles of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, all of which underpin the foundation of recent progresses. Many implementations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. We will discuss emerging topics by revealing the intrinsic connections between unsupervised and semi-supervised learning, and propose in future directions to bridge the algorithmic and theoretical gap between transformation equivariance for unsupervised learning and supervised invariance for supervised learning, and unify unsupervised pretraining and supervised finetuning. We will also provide a broader outlook of future directions to unify transformation and instance equivariances for representation learning, connect unsupervised and semi-supervised augmentations, and explore the role of the self-supervised regularization for many learning problems.

摘要

在许多问题中,使用少量标注数据的表示学习已经出现,因为深度学习网络的成功往往依赖于大量标注数据的可用性,而这些数据的收集成本很高。为了解决这个问题,人们已经在使用少量标注数据进行无监督和半监督训练复杂模型方面做出了很多努力。在本文中,我们将回顾这两类方法的最新进展。我们将在一个大的框架中对广泛的模型进行分类,展示它们是如何相互作用的,以激发新思想的探索。我们将回顾学习变换等变、解耦、自监督和半监督表示的原理,这些原理都是最近进展的基础。基于这些标准,已经开发了许多无监督和半监督生成模型的实现,通过探索未标记数据的分布,为更强大的表示形式,极大地扩展了现有自动编码器、生成对抗网络(GAN)和其他深度网络的领域。我们将通过揭示无监督学习和半监督学习之间的内在联系,讨论新兴的主题,并提出未来的方向,以弥合无监督学习的变换等变和监督学习的监督不变之间的算法和理论差距,并统一无监督预训练和监督微调。我们还将提供更广泛的未来方向展望,以统一表示学习的变换和实例等变,连接无监督和半监督增强,并探索自监督正则化在许多学习问题中的作用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验