Suppr超能文献

表征学习的流形学习视角:无编码器学习解码器和表征

A Manifold Learning Perspective on Representation Learning: Learning Decoder and Representations without an Encoder.

作者信息

Schuster Viktoria, Krogh Anders

机构信息

Center for Health Data Science, University of Copenhagen, 2200 Copenhagen, Denmark.

Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark.

出版信息

Entropy (Basel). 2021 Oct 25;23(11):1403. doi: 10.3390/e23111403.

Abstract

Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward method to map -dimensional data in input space to a lower -dimensional representation space and back. The decoder itself defines an -dimensional manifold in input space. Inspired by manifold learning, we showed that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derived expressions for the number of samples needed to specify the encoder and decoder and showed that the decoder generally requires much fewer training samples to be well-specified compared to the encoder. We discuss the training of autoencoders in this perspective and relate it to previous work in the field that uses noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrated that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further showed that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning.

摘要

自动编码器常用于表示学习。它们由一个编码器和一个解码器组成,提供了一种直接的方法,将输入空间中的高维数据映射到低维表示空间,然后再映射回来。解码器本身在输入空间中定义了一个高维流形。受流形学习的启发,我们表明,可以通过使用梯度下降法学习训练样本的表示以及解码器权重,单独训练解码器。平方和损失对应于优化流形,使其与训练样本的欧几里得距离最小,其他损失函数也是如此。我们推导了指定编码器和解码器所需样本数量的表达式,并表明与编码器相比,解码器通常需要少得多的训练样本就能得到很好的指定。我们从这个角度讨论自动编码器的训练,并将其与该领域之前使用有噪声训练示例和其他正则化类型的工作联系起来。在自然图像数据集MNIST和CIFAR10上,我们证明了解码器更适合学习低维表示,特别是在小数据集上进行训练时。使用模拟的基因调控数据,我们进一步表明,仅解码器就能带来更好的泛化能力和有意义的表示。我们单独训练解码器的方法即使在小数据集上也有助于表示学习,并且可以改进自动编码器的训练。我们希望所呈现的简单分析也将有助于提高对表示学习的概念理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfce/8625121/af5e89c1631a/entropy-23-01403-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验