Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland.
Bioinformatics. 2012 Sep 15;28(18):2318-24. doi: 10.1093/bioinformatics/bts433. Epub 2012 Jul 10.
Cancer development is driven by the accumulation of advantageous mutations and subsequent clonal expansion of cells harbouring these mutations, but the order in which mutations occur remains poorly understood. Advances in genome sequencing and the soon-arriving flood of cancer genome data produced by large cancer sequencing consortia hold the promise to elucidate cancer progression. However, new computational methods are needed to analyse these large datasets.
We present a Bayesian inference scheme for Conjunctive Bayesian Networks, a probabilistic graphical model in which mutations accumulate according to partial order constraints and cancer genotypes are observed subject to measurement noise. We develop an efficient MCMC sampling scheme specifically designed to overcome local optima induced by dependency structures. We demonstrate the performance advantage of our sampler over traditional approaches on simulated data and show the advantages of adopting a Bayesian perspective when reanalyzing cancer datasets and comparing our results to previous maximum-likelihood-based approaches.
An R package including the sampler and examples is available at http://www.cbg.ethz.ch/software/bayes-cbn.
癌症的发展是由有利突变的积累和随后携带这些突变的细胞的克隆扩张驱动的,但突变发生的顺序仍知之甚少。基因组测序的进步以及即将由大型癌症测序联盟产生的大量癌症基因组数据有望阐明癌症的进展。然而,需要新的计算方法来分析这些大型数据集。
我们提出了一种用于连接贝叶斯网络的贝叶斯推断方案,这是一种概率图形模型,其中突变根据偏序约束累积,并且癌症基因型在存在测量噪声的情况下被观察到。我们开发了一种有效的 MCMC 抽样方案,专门设计用于克服由依赖结构引起的局部最优。我们在模拟数据上展示了我们的抽样器相对于传统方法的性能优势,并展示了当重新分析癌症数据集并将我们的结果与以前基于最大似然的方法进行比较时,采用贝叶斯观点的优势。
一个包括抽样器和示例的 R 包可在 http://www.cbg.ethz.ch/software/bayes-cbn 获得。