Shojaie Ali, Michailidis George
Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, USA.
J Comput Biol. 2009 Mar;16(3):407-26. doi: 10.1089/cmb.2008.0081.
Networks are often used to represent the interactions among genes and proteins. These interactions are known to play an important role in vital cell functions and should be included in the analysis of genes that are differentially expressed. Methods of gene set analysis take advantage of external biological information and analyze a priori defined sets of genes. These methods can potentially preserve the correlation among genes; however, they do not directly incorporate the information about the gene network. In this paper, we propose a latent variable model that directly incorporates the network information. We then use the theory of mixed linear models to present a general inference framework for the problem of testing the significance of subnetworks. Several possible test procedures are introduced and a network based method for testing the changes in expression levels of genes as well as the structure of the network is presented. The performance of the proposed method is compared with methods of gene set analysis using both simulation studies, as well as real data on genes related to the galactose utilization pathway in yeast.
网络常被用于表示基因与蛋白质之间的相互作用。已知这些相互作用在细胞的重要功能中发挥着重要作用,并且应纳入对差异表达基因的分析中。基因集分析方法利用外部生物学信息并分析预先定义的基因集。这些方法有可能保留基因之间的相关性;然而,它们并未直接纳入有关基因网络的信息。在本文中,我们提出了一种直接纳入网络信息的潜在变量模型。然后,我们使用混合线性模型理论为检验子网显著性的问题提出一个通用的推理框架。介绍了几种可能的检验程序,并提出了一种基于网络的方法来检验基因表达水平的变化以及网络结构。使用模拟研究以及酵母中与半乳糖利用途径相关基因的真实数据,将所提出方法的性能与基因集分析方法进行了比较。