Liang Yulan, Kelemen Arpad
Department of Organizational Systems and Adult Health, University of Maryland, 655 W, Lombard Street, Baltimore, MD 21201-1579, USA.
BMC Bioinformatics. 2008 Aug 28;9:354. doi: 10.1186/1471-2105-9-354.
This paper addresses key biological problems and statistical issues in the analysis of large gene expression data sets that describe systemic temporal response cascades to therapeutic doses in multiple tissues such as liver, skeletal muscle, and kidney from the same animals. Affymetrix time course gene expression data U34A are obtained from three different tissues including kidney, liver and muscle. Our goal is not only to find the concordance of gene in different tissues, identify the common differentially expressed genes over time and also examine the reproducibility of the findings by integrating the results through meta analysis from multiple tissues in order to gain a significant increase in the power of detecting differentially expressed genes over time and to find the differential differences of three tissues responding to the drug.
Bayesian categorical model for estimating the proportion of the 'call' are used for pre-screening genes. Hierarchical Bayesian Mixture Model is further developed for the identifications of differentially expressed genes across time and dynamic clusters. Deviance information criterion is applied to determine the number of components for model comparisons and selections. Bayesian mixture model produces the gene-specific posterior probability of differential/non-differential expression and the 95% credible interval, which is the basis for our further Bayesian meta-inference. Meta-analysis is performed in order to identify commonly expressed genes from multiple tissues that may serve as ideal targets for novel treatment strategies and to integrate the results across separate studies. We have found the common expressed genes in the three tissues. However, the up/down/no regulations of these common genes are different at different time points. Moreover, the most differentially expressed genes were found in the liver, then in kidney, and then in muscle.
本文探讨了在分析大型基因表达数据集时的关键生物学问题和统计问题,这些数据集描述了同一动物的肝脏、骨骼肌和肾脏等多个组织对治疗剂量的全身时间响应级联。Affymetrix时间进程基因表达数据U34A来自肾脏、肝脏和肌肉这三种不同组织。我们的目标不仅是找到不同组织中基因的一致性,确定随时间变化的共同差异表达基因,还通过对多个组织的结果进行元分析来整合结果,以显著提高检测随时间变化的差异表达基因的能力,并找出三种组织对药物反应的差异差异。
用于估计“调用”比例的贝叶斯分类模型用于基因预筛选。进一步开发了分层贝叶斯混合模型,用于识别跨时间和动态聚类的差异表达基因。应用偏差信息准则来确定模型比较和选择的组件数量。贝叶斯混合模型产生基因特异性的差异/非差异表达后验概率和95%可信区间,这是我们进一步进行贝叶斯元推断的基础。进行元分析是为了识别多个组织中可能作为新型治疗策略理想靶点的共同表达基因,并整合不同研究的结果。我们在三种组织中发现了共同表达的基因。然而,这些共同基因在不同时间点的上调/下调/无调控情况不同。此外,差异表达最显著的基因在肝脏中最多,其次是肾脏,然后是肌肉。