Reverter A, Wang Y H, Byrne K A, Tan S H, Harper G S, Lehnert S A
The Cooperative Research Centre for Cattle and Beef Quality, CSIRO Livestock Industries, Queensland Bioscience Precinct, St. Lucia, Queensland 4067, Australia.
J Anim Sci. 2004 Dec;82(12):3430-9. doi: 10.2527/2004.82123430x.
In functional genomic laboratories, it is common to use the same microarray slide across studies, each investigating a unique biological question, and each analyzed separately due to computational limitations and/or because there is no hybridization of samples from different studies on one slide. However, the question of analyzing data from multiple studies is a major current issue in microarray data analysis because there are gains to be made in the accuracy of estimated effects by exploiting a covariance structure between gene expression data across studies. We propose an approach for combining multiple studies using multivariate mixed models, with the assumption of a nonzero correlation among genes across experiments, while imposing a null residual covariance. We applied this method to jointly analyze three experiments in genetics of cattle with a total of 54 arrays, each with 19,200 spots and 7,638 elements. The resulting seven-variate model contains 752,476 equations and 56 covariances. To identify differentially expressed genes, we applied model-based clustering to a linear combination of the random gene x variety interaction effect. We enhanced the biological interpretation of the results by applying an iterative algorithm to identify the gene ontology classes that significantly changed in each experiment. We found 118 elements with coordinate expression that clustered into distinct biological functions such as adipogenesis and protein turnover. These results contribute to our understanding of the mechanistic processes involved in adipogenesis and nutrient partitioning.
在功能基因组实验室中,跨研究使用同一张微阵列玻片是很常见的,每项研究都探讨一个独特的生物学问题,并且由于计算限制和/或因为不同研究的样本不能在一张玻片上杂交,所以每项研究都是单独分析的。然而,分析来自多项研究的数据这一问题是微阵列数据分析当前的一个主要问题,因为通过利用跨研究的基因表达数据之间的协方差结构,可以提高估计效应的准确性。我们提出一种使用多变量混合模型来合并多项研究的方法,假设跨实验的基因之间存在非零相关性,同时施加零残差协方差。我们应用此方法联合分析了牛遗传学中的三个实验,共有54个阵列,每个阵列有19200个斑点和7638个元件。由此产生的七变量模型包含752476个方程和56个协方差。为了识别差异表达基因,我们将基于模型的聚类应用于随机基因×品种交互效应的线性组合。我们通过应用迭代算法来识别每个实验中显著变化的基因本体类别,增强了结果的生物学解释。我们发现118个具有协同表达的元件,它们聚集成不同的生物学功能,如脂肪生成和蛋白质周转。这些结果有助于我们理解脂肪生成和营养分配所涉及的机制过程。