使用指数幂混合的稳健聚类

Robust clustering using exponential power mixtures.

作者信息

Zhang Jian, Liang Faming

机构信息

Department of Mathematics, University of York, Heslington, York, UK.

出版信息

Biometrics. 2010 Dec;66(4):1078-86. doi: 10.1111/j.1541-0420.2010.01389.x.

DOI:10.1111/j.1541-0420.2010.01389.x

PMID:20163406

Abstract

Clustering is a widely used method in extracting useful information from gene expression data, where unknown correlation structures in genes are believed to persist even after normalization. Such correlation structures pose a great challenge on the conventional clustering methods, such as the Gaussian mixture (GM) model, k-means (KM), and partitioning around medoids (PAM), which are not robust against general dependence within data. Here we use the exponential power mixture model to increase the robustness of clustering against general dependence and nonnormality of the data. An expectation-conditional maximization algorithm is developed to calculate the maximum likelihood estimators (MLEs) of the unknown parameters in these mixtures. The Bayesian information criterion is then employed to determine the numbers of components of the mixture. The MLEs are shown to be consistent under sparse dependence. Our numerical results indicate that the proposed procedure outperforms GM, KM, and PAM when there are strong correlations or non-Gaussian components in the data.

摘要

聚类是从基因表达数据中提取有用信息的一种广泛使用的方法，即使在归一化之后，基因中未知的相关结构仍被认为是持续存在的。这种相关结构对传统的聚类方法提出了巨大挑战，比如高斯混合（GM）模型、k均值（KM）和围绕中心点划分法（PAM），这些方法对数据中的一般依赖性并不稳健。在这里，我们使用指数幂混合模型来提高聚类对数据的一般依赖性和非正态性的稳健性。开发了一种期望条件最大化算法来计算这些混合模型中未知参数的最大似然估计值（MLE）。然后使用贝叶斯信息准则来确定混合模型的成分数量。在稀疏依赖性下，最大似然估计值被证明是一致的。我们的数值结果表明，当数据中存在强相关性或非高斯成分时，所提出的方法优于GM、KM和PAM。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用指数幂混合的稳健聚类

Robust clustering using exponential power mixtures.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

使用指数幂混合的稳健聚类

Robust clustering using exponential power mixtures.

作者信息

机构信息

出版信息

相似文献

引用本文的文献