Qin Li-Xuan, Self Steven G
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA.
Biometrics. 2006 Jun;62(2):526-33. doi: 10.1111/j.1541-0420.2005.00498.x.
Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we proposed a new model-based clustering method--the clustering of regression models method, which groups genes that share a similar relationship to the covariate(s). This method provides a unified approach for a family of clustering procedures and can be applied for data collected with various experimental designs. In addition, when combined with per-gene methods for assessing differential expression that employ the same regression modeling structure, an integrated framework for the analysis of microarray data is obtained. The proposed methodology was applied to two microarray data sets, one from a breast cancer study and the other from a yeast cell cycle study.
识别差异表达基因和基因聚类是利用基因表达数据解决的两个重要且互补的目标。对于差异表达问题,已经提出了许多“单基因”分析方法。这些方法通常可以被描述为使用回归函数对每个基因的观测值进行独立建模;然后使用各种多重性调整来解释这些单基因回归模型在所分析基因集合上的统计显著性。受单基因模型这种常见结构的启发,我们提出了一种新的基于模型的聚类方法——回归模型聚类方法,该方法将与协变量具有相似关系的基因归为一组。此方法为一系列聚类过程提供了统一的方法,并且可应用于通过各种实验设计收集的数据。此外,当与采用相同回归建模结构的评估差异表达的单基因方法相结合时,可获得一个用于分析微阵列数据的综合框架。所提出的方法应用于两个微阵列数据集,一个来自乳腺癌研究,另一个来自酵母细胞周期研究。