稀疏编码变分自编码器。

Sparse-Coding Variational Autoencoders.

机构信息

Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ 08544, U.S.A.

Institute of Neuroscience, University of Oregon, Eugene, OR 97403, U.S.A.

出版信息

Neural Comput. 2024 Nov 19;36(12):2571-2601. doi: 10.1162/neco_a_01715.

DOI:10.1162/neco_a_01715

PMID:39383030

Abstract

The sparse coding model posits that the visual system has evolved to efficiently code natural stimuli using a sparse set of features from an overcomplete dictionary. The original sparse coding model suffered from two key limitations; however: (1) computing the neural response to an image patch required minimizing a nonlinear objective function via recurrent dynamics and (2) fitting relied on approximate inference methods that ignored uncertainty. Although subsequent work has developed several methods to overcome these obstacles, we propose a novel solution inspired by the variational autoencoder (VAE) framework. We introduce the sparse coding variational autoencoder (SVAE), which augments the sparse coding model with a probabilistic recognition model parameterized by a deep neural network. This recognition model provides a neurally plausible feedforward implementation for the mapping from image patches to neural activities and enables a principled method for fitting the sparse coding model to data via maximization of the evidence lower bound (ELBO). The SVAE differs from standard VAEs in three key respects: the latent representation is overcomplete (there are more latent dimensions than image pixels), the prior is sparse or heavy-tailed instead of gaussian, and the decoder network is a linear projection instead of a deep network. We fit the SVAE to natural image data under different assumed prior distributions and show that it obtains higher test performance than previous fitting methods. Finally, we examine the response properties of the recognition network and show that it captures important nonlinear properties of neurons in the early visual pathway.

摘要

稀疏编码模型假设，视觉系统已经进化到能够使用过完备字典中的少量特征来有效地对自然刺激进行编码。然而，原始的稀疏编码模型存在两个关键的局限性：（1）计算图像块的神经响应需要通过递归动力学最小化非线性目标函数；（2）拟合依赖于忽略不确定性的近似推理方法。尽管后续工作已经开发了几种克服这些障碍的方法，但我们提出了一种受变分自动编码器（VAE）框架启发的新解决方案。我们引入了稀疏编码变分自动编码器（SVAE），它通过深度神经网络参数化的概率识别模型来增强稀疏编码模型。该识别模型为从图像块到神经活动的映射提供了一种神经上合理的前馈实现，并通过最大化证据下界（ELBO）为稀疏编码模型拟合数据提供了一种有原则的方法。SVAE 在三个关键方面与标准 VAE 不同：潜在表示是过完备的（潜在维度比图像像素多），先验是稀疏的或重尾的，而不是高斯的，解码器网络是线性投影，而不是深度网络。我们在不同的先验分布下对自然图像数据进行了 SVAE 拟合，并表明它比以前的拟合方法获得了更高的测试性能。最后，我们检查了识别网络的响应特性，并表明它捕获了早期视觉通路中神经元的重要非线性特性。