Qian Dong, Cheung William K
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1949-1962. doi: 10.1109/TPAMI.2022.3160509. Epub 2023 Jan 6.
Variational autoencoders (VAEs) are a class of effective deep generative models, with the objective to approximate the true, but unknown data distribution. VAEs make use of latent variables to capture high-level semantics so as to reconstruct the data well with the help of informative latent variables. Yet, training VAEs tends to suffer from posterior collapse, when the decoder is parameterized by an autoregressive model for sequence generation. VAEs can be further enhanced by introducing multiple layers of latent variables, but the posterior collapse issue hinders the adoption of such hierarchical VAEs in real-world applications. In this paper, we introduce InfoMaxHVAE, which integrates mutual information estimated via neural networks into hierarchical VAEs to alleviate posterior collapse, when powerful autoregressive models are used for modeling sequences. Experimental results on a number of text and image datasets show that InfoMaxHVAE can outperform the state-of-the-art baselines and exhibits less posterior collapse. We further show that InfoMaxHVAE can shape a coarse-to-fine hierarchical organization of the latent space.
变分自编码器(VAEs)是一类有效的深度生成模型,其目标是逼近真实但未知的数据分布。VAEs利用潜在变量来捕获高级语义,以便在信息丰富的潜在变量的帮助下很好地重建数据。然而,当解码器由用于序列生成的自回归模型进行参数化时,训练VAEs往往会遭受后验坍缩问题。通过引入多层潜在变量可以进一步增强VAEs,但后验坍缩问题阻碍了这种分层VAEs在实际应用中的采用。在本文中,我们介绍了InfoMaxHVAE,当使用强大的自回归模型对序列进行建模时,它将通过神经网络估计的互信息集成到分层VAEs中,以缓解后验坍缩。在多个文本和图像数据集上的实验结果表明,InfoMaxHVAE可以优于当前的基准模型,并且表现出较少的后验坍缩。我们进一步表明,InfoMaxHVAE可以塑造潜在空间从粗到细的分层组织。