Suppr超能文献

用于高维聚类的正则化高斯混合模型

Regularized Gaussian Mixture Model for High-Dimensional Clustering.

作者信息

Zhao Yang, Shrivastava Abhishek K, Tsui Kwok Leung

出版信息

IEEE Trans Cybern. 2019 Oct;49(10):3677-3688. doi: 10.1109/TCYB.2018.2846404. Epub 2018 Jun 27.

Abstract

Finding low-dimensional representation of high-dimensional data sets is an important task in various applications. The fact that data sets often contain clusters embedded in different subspaces poses barrier to this task. Driven by the need in methods that enable clustering and finding each cluster's intrinsic subspace simultaneously, in this paper, we propose a regularized Gaussian mixture model (GMM) for clustering. Despite the advantages of GMM, such as its probabilistic interpretation and robustness against observation noise, traditional maximum-likelihood estimation for GMMs shows disappointing performance in high-dimensional setting. The proposed regularization method finds low-dimensional representations of the component covariance matrices, resulting in better estimation of local feature correlations. The regularization problem can be incorporated in the expectation maximization algorithm for maximizing the likelihood function of a GMM, with the M -step modified to incorporate the regularization. The M -step involves a determinant maximization problem, which can be solved efficiently. The performance of the proposed method is demonstrated using several simulated data sets. We also illustrate the potential value of the proposed method in applications using four real data sets.

摘要

在各种应用中,寻找高维数据集的低维表示是一项重要任务。数据集通常包含嵌入在不同子空间中的聚类这一事实给该任务带来了障碍。受同时实现聚类和找到每个聚类的固有子空间的方法需求驱动,在本文中,我们提出了一种用于聚类的正则化高斯混合模型(GMM)。尽管GMM具有诸如概率解释和对观测噪声的鲁棒性等优点,但传统的GMM最大似然估计在高维设置下表现令人失望。所提出的正则化方法找到了分量协方差矩阵的低维表示,从而更好地估计了局部特征相关性。正则化问题可以纳入期望最大化算法中,以最大化GMM的似然函数,其中M步经过修改以纳入正则化。M步涉及一个行列式最大化问题,可以有效地求解。使用几个模拟数据集展示了所提出方法的性能。我们还使用四个真实数据集说明了所提出方法在应用中的潜在价值。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验