Dortet-Bernadet Jean-Luc, Wicker Nicolas
Institut de Recherche Mathématique Avancée (IRMA), UMR 7501 CNRS, Université Louis Pasteur, Strasbourg, France.
Biostatistics. 2008 Jan;9(1):66-80. doi: 10.1093/biostatistics/kxm012. Epub 2007 Apr 27.
We consider model-based clustering of data that lie on a unit sphere. Such data arise in the analysis of microarray experiments when the gene expressions are standardized so that they have mean 0 and variance 1 across the arrays. We propose to model the clusters on the sphere with inverse stereographic projections of multivariate normal distributions. The corresponding model-based clustering algorithm is described. This algorithm is applied first to simulated data sets to assess the performance of several criteria for determining the number of clusters and to compare its performance with existing methods and second to a real reference data set of standardized gene expression profiles.
我们考虑对位于单位球面上的数据进行基于模型的聚类。当基因表达被标准化,使得它们在各个阵列上的均值为0且方差为1时,此类数据会出现在微阵列实验的分析中。我们建议用多元正态分布的逆球极投影对球面上的聚类进行建模。描述了相应的基于模型的聚类算法。该算法首先应用于模拟数据集,以评估几种确定聚类数目的标准的性能,并将其性能与现有方法进行比较;其次应用于标准化基因表达谱的真实参考数据集。