Qu Yi, Xu Shizhong
Department of Botany and Plant Sciences, University of California, Riverside, USA.
Mol Biol Evol. 2006 Aug;23(8):1558-73. doi: 10.1093/molbev/msl019. Epub 2006 May 26.
Selection on phenotypes may cause genetic change. To understand the relationship between phenotype and gene expression from an evolutionary viewpoint, it is important to study the concordance between gene expression and profiles of phenotypes. In this study, we use a novel method of clustering to identify genes whose expression profiles are related to a quantitative phenotype. Cluster analysis of gene expression data aims at classifying genes into several different groups based on the similarity of their expression profiles across multiple conditions. The hope is that genes that are classified into the same clusters may share underlying regulatory elements or may be a part of the same metabolic pathways. Current methods for examining the association between phenotype and gene expression are limited to linear association measured by the correlation between individual gene expression values and phenotype. Genes may be associated with the phenotype in a nonlinear fashion. In addition, groups of genes that share a particular pattern in their relationship to phenotype may be of evolutionary interest. In this study, we develop a method to group genes based on orthogonal polynomials under a multivariate Gaussian mixture model. The effect of each expressed gene on the phenotype is partitioned into a cluster mean and a random deviation from the mean. Genes can also be clustered based on a time series. Parameters are estimated using the expectation-maximization algorithm and implemented in SAS. The method is verified with simulated data and demonstrated with experimental data from 2 studies, one clusters with respect to severity of disease in Alzheimer's patients and another clusters data for a rat fracture healing study over time. We find significant evidence of nonlinear associations in both studies and successfully describe these patterns with our method. We give detailed instructions and provide a working program that allows others to directly implement this method in their own analyses.
对表型的选择可能会导致基因变化。从进化的角度理解表型与基因表达之间的关系,研究基因表达与表型概况之间的一致性很重要。在本研究中,我们使用一种新的聚类方法来识别其表达谱与定量表型相关的基因。基因表达数据的聚类分析旨在根据多个条件下基因表达谱的相似性将基因分为几个不同的组。希望是被分类到同一聚类中的基因可能共享潜在的调控元件,或者可能是同一代谢途径的一部分。目前用于检验表型与基因表达之间关联的方法仅限于通过单个基因表达值与表型之间的相关性来衡量的线性关联。基因可能以非线性方式与表型相关联。此外,在与表型的关系中共享特定模式的基因组可能具有进化意义。在本研究中,我们开发了一种在多元高斯混合模型下基于正交多项式对基因进行分组的方法。每个表达基因对表型的影响被划分为聚类均值和与均值的随机偏差。基因也可以基于时间序列进行聚类。使用期望最大化算法估计参数并在SAS中实现。该方法用模拟数据进行了验证,并用来自两项研究的实验数据进行了演示,一项是关于阿尔茨海默病患者疾病严重程度的聚类,另一项是关于大鼠骨折愈合研究随时间的数据聚类。我们在两项研究中都发现了非线性关联的显著证据,并成功地用我们的方法描述了这些模式。我们给出了详细的说明并提供了一个工作程序,使其他人能够在自己的分析中直接实施此方法。