Sykacek P, Clarkson R, Print C, Furlong R, Micklem G
Department of Biotechnology, BOKU University, Vienna, Austria.
Bioinformatics. 2007 Aug 1;23(15):1936-44. doi: 10.1093/bioinformatics/btm280. Epub 2007 May 31.
Biological assays are often carried out on tissues that contain many cell lineages and active pathways. Microarray data produced using such material therefore reflect superimpositions of biological processes. Analysing such data for shared gene function by means of well-matched assays may help to provide a better focus on specific cell types and processes. The identification of genes that behave similarly in different biological systems also has the potential to reveal new insights into preserved biological mechanisms.
In this article, we propose a hierarchical Bayesian model allowing integrated analysis of several microarray data sets for shared gene function. Each gene is associated with an indicator variable that selects whether binary class labels are predicted from expression values or by a classifier which is common to all genes. Each indicator selects the component models for all involved data sets simultaneously. A quantitative measure of shared gene function is obtained by inferring a probability measure over these indicators. Through experiments on synthetic data, we illustrate potential advantages of this Bayesian approach over a standard method. A shared analysis of matched microarray experiments covering (a) a cycle of mouse mammary gland development and (b) the process of in vitro endothelial cell apoptosis is proposed as a biological gold standard. Several useful sanity checks are introduced during data analysis, and we confirm the prior biological belief that shared apoptosis events occur in both systems. We conclude that a Bayesian analysis for shared gene function has the potential to reveal new biological insights, unobtainable by other means.
An online supplement and MatLab code are available at http://www.sykacek.net/research.html#mcabf
生物学分析通常在包含多种细胞谱系和活跃信号通路的组织上进行。因此,使用此类材料产生的微阵列数据反映了生物过程的叠加。通过匹配良好的分析来分析此类数据以寻找共享基因功能,可能有助于更好地聚焦于特定细胞类型和过程。鉴定在不同生物系统中表现相似的基因,也有可能揭示对保守生物机制的新见解。
在本文中,我们提出了一种层次贝叶斯模型,用于对多个微阵列数据集进行共享基因功能的综合分析。每个基因都与一个指示变量相关联,该变量选择是根据表达值预测二元类别标签,还是通过所有基因通用的分类器来预测。每个指示变量同时为所有涉及的数据集选择组件模型。通过推断这些指示变量上的概率测度,可获得共享基因功能的定量度量。通过对合成数据的实验,我们说明了这种贝叶斯方法相对于标准方法的潜在优势。我们提出了对匹配微阵列实验的共享分析,该实验涵盖(a)小鼠乳腺发育周期和(b)体外内皮细胞凋亡过程,作为生物学金标准。在数据分析过程中引入了几个有用的合理性检查,并且我们证实了先前的生物学观点,即两个系统中都发生了共享凋亡事件。我们得出结论认为,对共享基因功能进行贝叶斯分析有潜力揭示其他方法无法获得的新生物学见解。
可在http://www.sykacek.net/research.html#mcabf获取在线补充材料和MatLab代码。