Smieja Marek, Wolczyk Maciej, Tabor Jacek, Geiger Bernhard C
IEEE Trans Neural Netw Learn Syst. 2021 Sep;32(9):3930-3941. doi: 10.1109/TNNLS.2020.3016221. Epub 2021 Aug 31.
We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and is implemented in a typical Wasserstein autoencoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussian components with correct classes, we use a small amount of labeled data and a Gaussian classifier induced by the target distribution. SeGMA is optimized efficiently due to the use of the Cramer-Wold distance as a maximum mean discrepancy penalty, which yields a closed-form expression for a mixture of spherical Gaussian components and, thus, obviates the need of sampling. While SeGMA preserves all properties of its semi-supervised predecessors and achieves at least as good generative performance on standard benchmark data sets, it presents additional features: 1) interpolation between any pair of points in the latent space produces realistically looking samples; 2) combining the interpolation property with disentangling of class and style information, SeGMA is able to perform continuous style transfer from one class to another; and 3) it is possible to change the intensity of class characteristics in a data point by moving the latent representation of the data point away from specific Gaussian components.
我们提出了一种半监督生成模型SeGMA,它学习数据及其类别的联合概率分布,并在典型的瓦瑟斯坦自动编码器框架中实现。我们选择高斯混合作为潜在空间中的目标分布,这为将数据自然地划分为簇提供了条件。为了将高斯分量与正确的类别联系起来,我们使用少量的标记数据和由目标分布诱导的高斯分类器。由于使用克拉默-沃尔德距离作为最大均值差异惩罚,SeGMA得到了有效优化,这为球形高斯分量的混合产生了一个闭式表达式,从而避免了采样的需要。虽然SeGMA保留了其半监督前身的所有属性,并且在标准基准数据集上至少具有同样良好的生成性能,但它还具有其他特点:1)潜在空间中任意两点之间的插值会生成看起来逼真的样本;2)将插值属性与类和风格信息的解纠缠相结合,SeGMA能够执行从一个类到另一个类的连续风格迁移;3)通过将数据点的潜在表示从特定的高斯分量移开,可以改变数据点中类特征的强度。