Suppr超能文献

使用指数幂混合的稳健聚类

Robust clustering using exponential power mixtures.

作者信息

Zhang Jian, Liang Faming

机构信息

Department of Mathematics, University of York, Heslington, York, UK.

出版信息

Biometrics. 2010 Dec;66(4):1078-86. doi: 10.1111/j.1541-0420.2010.01389.x.

Abstract

Clustering is a widely used method in extracting useful information from gene expression data, where unknown correlation structures in genes are believed to persist even after normalization. Such correlation structures pose a great challenge on the conventional clustering methods, such as the Gaussian mixture (GM) model, k-means (KM), and partitioning around medoids (PAM), which are not robust against general dependence within data. Here we use the exponential power mixture model to increase the robustness of clustering against general dependence and nonnormality of the data. An expectation-conditional maximization algorithm is developed to calculate the maximum likelihood estimators (MLEs) of the unknown parameters in these mixtures. The Bayesian information criterion is then employed to determine the numbers of components of the mixture. The MLEs are shown to be consistent under sparse dependence. Our numerical results indicate that the proposed procedure outperforms GM, KM, and PAM when there are strong correlations or non-Gaussian components in the data.

摘要

聚类是从基因表达数据中提取有用信息的一种广泛使用的方法,即使在归一化之后,基因中未知的相关结构仍被认为是持续存在的。这种相关结构对传统的聚类方法提出了巨大挑战,比如高斯混合(GM)模型、k均值(KM)和围绕中心点划分法(PAM),这些方法对数据中的一般依赖性并不稳健。在这里,我们使用指数幂混合模型来提高聚类对数据的一般依赖性和非正态性的稳健性。开发了一种期望条件最大化算法来计算这些混合模型中未知参数的最大似然估计值(MLE)。然后使用贝叶斯信息准则来确定混合模型的成分数量。在稀疏依赖性下,最大似然估计值被证明是一致的。我们的数值结果表明,当数据中存在强相关性或非高斯成分时,所提出的方法优于GM、KM和PAM。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验