Newton Michael A, Chung Lisa M
Department of Statistics, University of Wisconsin, Madison, 1300 University Ave, Madison, Wisconsin 53706-1532, USA.
Ann Stat. 2010 Dec 1;38(6):3217-3244. doi: 10.1214/10-aos805.
Discrete mixture models provide a well-known basis for effective clustering algorithms, although technical challenges have limited their scope. In the context of gene-expression data analysis, a model is presented that mixes over a finite catalog of structures, each one representing equality and inequality constraints among latent expected values. Computations depend on the probability that independent gamma-distributed variables attain each of their possible orderings. Each ordering event is equivalent to an event in independent negative-binomial random variables, and this finding guides a dynamic-programming calculation. The structuring of mixture-model components according to constraints among latent means leads to strict concavity of the mixture log likelihood. In addition to its beneficial numerical properties, the clustering method shows promising results in an empirical study.
离散混合模型为有效的聚类算法提供了一个众所周知的基础,尽管技术挑战限制了它们的应用范围。在基因表达数据分析的背景下,提出了一种模型,该模型在有限的结构目录上进行混合,每个结构代表潜在期望值之间的等式和不等式约束。计算依赖于独立的伽马分布变量达到其每个可能排序的概率。每个排序事件等同于独立负二项式随机变量中的一个事件,这一发现指导了动态规划计算。根据潜在均值之间的约束对混合模型组件进行结构化处理,会导致混合对数似然的严格凹性。除了其有益的数值特性外,该聚类方法在实证研究中也显示出了有前景的结果。