Department of Computer Science, Department of Biology, and Department of Statistics, Purdue University, West Lafayette, IN 47907, USA.
Bioinformatics. 2013 Aug 15;29(16):1987-96. doi: 10.1093/bioinformatics/btt335. Epub 2013 Jun 8.
By capturing various biochemical interactions, biological pathways provide insight into underlying biological processes. Given high-dimensional microarray or RNA-sequencing data, a critical challenge is how to integrate them with rich information from pathway databases to jointly select relevant pathways and genes for phenotype prediction or disease prognosis. Addressing this challenge can help us deepen biological understanding of phenotypes and diseases from a systems perspective.
In this article, we propose a novel sparse Bayesian model for joint network and node selection. This model integrates information from networks (e.g. pathways) and nodes (e.g. genes) by a hybrid of conditional and generative components. For the conditional component, we propose a sparse prior based on graph Laplacian matrices, each of which encodes detailed correlation structures between network nodes. For the generative component, we use a spike and slab prior over network nodes. The integration of these two components, coupled with efficient variational inference, enables the selection of networks as well as correlated network nodes in the selected networks. Simulation results demonstrate improved predictive performance and selection accuracy of our method over alternative methods. Based on three expression datasets for cancer study and the KEGG pathway database, we selected relevant genes and pathways, many of which are supported by biological literature. In addition to pathway analysis, our method is expected to have a wide range of applications in selecting relevant groups of correlated high-dimensional biomarkers.
The code can be downloaded at www.cs.purdue.edu/homes/szhe/software.html.
通过捕捉各种生化相互作用,生物途径为深入了解潜在的生物过程提供了线索。给定高维微阵列或 RNA 测序数据,一个关键的挑战是如何将它们与来自途径数据库的丰富信息集成,以共同选择与表型预测或疾病预后相关的途径和基因。解决这个挑战可以帮助我们从系统的角度深化对表型和疾病的生物学理解。
在本文中,我们提出了一种用于联合网络和节点选择的新颖稀疏贝叶斯模型。该模型通过条件和生成组件的混合,集成了来自网络(如途径)和节点(如基因)的信息。对于条件组件,我们基于图拉普拉斯矩阵提出了一个稀疏先验,每个矩阵都编码了网络节点之间的详细相关结构。对于生成组件,我们在网络节点上使用尖峰和板片先验。这两个组件的集成,加上有效的变分推理,使网络以及所选网络中的相关网络节点的选择成为可能。模拟结果表明,我们的方法在预测性能和选择准确性方面优于替代方法。基于三个癌症研究的表达数据集和 KEGG 途径数据库,我们选择了相关的基因和途径,其中许多都得到了生物学文献的支持。除了途径分析,我们的方法有望在选择相关的高维生物标志物相关组方面有广泛的应用。
代码可在 www.cs.purdue.edu/homes/szhe/software.html 下载。