Yang Lin, Fan Wentao, Bouguila Nizar
IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):340-350. doi: 10.1109/TNNLS.2020.3027761. Epub 2022 Jan 5.
Clustering is a fundamental problem that frequently arises in many fields, such as pattern recognition, data mining, and machine learning. Although various clustering algorithms have been developed in the past, traditional clustering algorithms with shallow structures cannot excavate the interdependence of complex data features in latent space. Recently, deep generative models, such as autoencoder (AE), variational AE (VAE), and generative adversarial network (GAN), have achieved remarkable success in many unsupervised applications thanks to their capabilities for learning promising latent representations from original data. In this work, first we propose a novel clustering approach based on both Wasserstein GAN with gradient penalty (WGAN-GP) and VAE with a Gaussian mixture prior. By combining the WGAN-GP with VAE, the generator of WGAN-GP is formulated by drawing samples from the probabilistic decoder of VAE. Moreover, to provide more robust clustering and generation performance when outliers are encountered in data, a variant of the proposed deep generative model is developed based on a Student's-t mixture prior. The effectiveness of our deep generative models is validated though experiments on both clustering analysis and samples generation. Through the comparison with other state-of-art clustering approaches based on deep generative models, the proposed approach can provide more stable training of the model, improve the accuracy of clustering, and generate realistic samples.
聚类是一个在许多领域经常出现的基本问题,如模式识别、数据挖掘和机器学习。尽管过去已经开发了各种聚类算法,但结构简单的传统聚类算法无法挖掘潜在空间中复杂数据特征的相互依赖性。最近,深度生成模型,如自动编码器(AE)、变分自动编码器(VAE)和生成对抗网络(GAN),由于其能够从原始数据中学习有前景的潜在表示,在许多无监督应用中取得了显著成功。在这项工作中,首先我们提出了一种基于带梯度惩罚的瓦瑟斯坦生成对抗网络(WGAN-GP)和具有高斯混合先验的VAE的新型聚类方法。通过将WGAN-GP与VAE相结合,WGAN-GP的生成器是通过从VAE的概率解码器中采样来构建的。此外,为了在数据中遇到离群值时提供更稳健的聚类和生成性能,基于学生t混合先验开发了所提出的深度生成模型的一个变体。通过聚类分析和样本生成实验验证了我们深度生成模型的有效性。通过与其他基于深度生成模型的先进聚类方法进行比较,所提出的方法可以为模型提供更稳定的训练,提高聚类的准确性,并生成逼真的样本。