Sun Hokeun, Li Hongzhe
Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104, USA.
Int J Syst Synth Biol. 2010;1(2):255-272.
Many different biological processes are represented by network graphs such as regulatory networks, metabolic pathways, and protein-protein interaction networks. Since genes that are linked on the networks usually have biologically similar functions, the linked genes form molecular modules to affect the clinical phenotypes/outcomes. Similarly, in large-scale genetic association studies, many SNPs are in high linkage disequilibrium (LD), which can also be summarized as a LD graph. In order to incorporate the graph information into regression analysis with high dimensional genomic data as predictors, we introduce a Bayesian approach for graph-constrained estimation (Bayesian GRACE) and regularization, which controls the amount of regularization for sparsity and smoothness of the regression coefficients. The Bayesian estimation with their posterior distributions can provide credible intervals for the estimates of the regression coefficients along with standard errors. The deviance information criterion (DIC) is applied for model assessment and tuning parameter selection. The performance of the proposed Bayesian approach is evaluated through simulation studies and is compared with Bayesian Lasso and Bayesian Elastic-net procedures. We demonstrate our method in an analysis of data from a case-control genome-wide association study of neuroblastoma using a weighted LD graph.
许多不同的生物学过程都可以用网络图来表示,如调控网络、代谢途径和蛋白质-蛋白质相互作用网络。由于网络上相连的基因通常具有生物学上相似的功能,这些相连的基因形成分子模块来影响临床表型/结果。同样,在大规模基因关联研究中,许多单核苷酸多态性(SNP)处于高度连锁不平衡(LD)状态,这也可以总结为一个LD图。为了将图信息纳入以高维基因组数据为预测变量的回归分析中,我们引入了一种用于图约束估计(贝叶斯GRACE)和正则化的贝叶斯方法,该方法控制回归系数稀疏性和平滑性的正则化量。带有后验分布的贝叶斯估计可以为回归系数的估计提供可信区间以及标准误差。偏差信息准则(DIC)用于模型评估和调优参数选择。通过模拟研究评估了所提出的贝叶斯方法的性能,并与贝叶斯套索法和贝叶斯弹性网法进行了比较。我们使用加权LD图,在一项神经母细胞瘤病例对照全基因组关联研究的数据中展示了我们的方法。