Zhi Wei, Minturn Jane, Rappaport Eric, Brodeur Garrett, Li Hongzhe
Department of Biostatistics and Epidemiology, New Jersey Institute of Technology, Newark, NJ, USA.
Methods Mol Biol. 2013;972:121-39. doi: 10.1007/978-1-60327-337-4_8.
Multivariate microarray gene expression data are commonly collected to study the genomic responses under ordered conditions such as over increasing/decreasing dose levels or over time during biological processes, where the expression levels of a give gene are expected to be dependent. One important question from such multivariate gene expression experiments is to identify genes that show different expression patterns over treatment dosages or over time; these genes can also point to the pathways that are perturbed during a given biological process. Several empirical Bayes approaches have been developed for identifying the differentially expressed genes in order to account for the parallel structure of the data and to borrow information across all the genes. However, these methods assume that the genes are independent. In this paper, we introduce an alternative empirical Bayes approach for analysis of multivariate gene expression data by assuming a discrete Markov random field (MRF) prior, where the dependency of the differential expression patterns of genes on the networks are modeled by a Markov random field. Simulation studies indicated that the method is quite effective in identifying genes and the modified subnetworks and has higher sensitivity than the commonly used procedures that do not use the pathway information, with similar observed false discovery rates. We applied the proposed methods for analysis of a microarray time course gene expression study of TrkA- and TrkB-transfected neuroblastoma cell lines and identified genes and subnetworks on MAPK, focal adhesion, and prion disease pathways that may explain cell differentiation in TrkA-transfected cell lines.
多变量微阵列基因表达数据通常是为了研究在有序条件下的基因组反应而收集的,比如在生物过程中剂量水平增加/减少或随时间变化的情况下,其中给定基因的表达水平预计是相关的。这类多变量基因表达实验的一个重要问题是识别在处理剂量或时间上表现出不同表达模式的基因;这些基因也可以指向在给定生物过程中受到干扰的途径。已经开发了几种经验贝叶斯方法来识别差异表达基因,以便考虑数据的平行结构并在所有基因之间借用信息。然而,这些方法假设基因是独立的。在本文中,我们通过假设离散马尔可夫随机场(MRF)先验,引入了一种用于分析多变量基因表达数据的替代经验贝叶斯方法,其中基因差异表达模式在网络上的依赖性由马尔可夫随机场建模。模拟研究表明,该方法在识别基因和修改后的子网方面相当有效,并且比不使用途径信息的常用程序具有更高的灵敏度,同时观察到的错误发现率相似。我们将所提出的方法应用于对TrkA和TrkB转染的神经母细胞瘤细胞系的微阵列时间进程基因表达研究的分析,并确定了可能解释TrkA转染细胞系中细胞分化的丝裂原活化蛋白激酶(MAPK)、粘着斑和朊病毒病途径上的基因和子网。