Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912, USA.
Genome Res. 2012 Feb;22(2):375-85. doi: 10.1101/gr.120477.111. Epub 2011 Jun 7.
Next-generation DNA sequencing technologies are enabling genome-wide measurements of somatic mutations in large numbers of cancer patients. A major challenge in the interpretation of these data is to distinguish functional "driver mutations" important for cancer development from random "passenger mutations." A common approach for identifying driver mutations is to find genes that are mutated at significant frequency in a large cohort of cancer genomes. This approach is confounded by the observation that driver mutations target multiple cellular signaling and regulatory pathways. Thus, each cancer patient may exhibit a different combination of mutations that are sufficient to perturb these pathways. This mutational heterogeneity presents a problem for predicting driver mutations solely from their frequency of occurrence. We introduce two combinatorial properties, coverage and exclusivity, that distinguish driver pathways, or groups of genes containing driver mutations, from groups of genes with passenger mutations. We derive two algorithms, called Dendrix, to find driver pathways de novo from somatic mutation data. We apply Dendrix to analyze somatic mutation data from 623 genes in 188 lung adenocarcinoma patients, 601 genes in 84 glioblastoma patients, and 238 known mutations in 1000 patients with various cancers. In all data sets, we find groups of genes that are mutated in large subsets of patients and whose mutations are approximately exclusive. Our Dendrix algorithms scale to whole-genome analysis of thousands of patients and thus will prove useful for larger data sets to come from The Cancer Genome Atlas (TCGA) and other large-scale cancer genome sequencing projects.
下一代 DNA 测序技术使我们能够在大量癌症患者中进行全基因组的体细胞突变测量。在解释这些数据时,一个主要的挑战是区分对癌症发展重要的功能“驱动突变”和随机的“乘客突变”。一种识别驱动突变的常用方法是找到在大量癌症基因组中高频突变的基因。这种方法受到以下观察结果的影响:驱动突变靶向多个细胞信号和调节途径。因此,每个癌症患者可能表现出不同的突变组合,足以扰乱这些途径。这种突变异质性给仅根据其发生频率预测驱动突变带来了问题。我们引入了两个组合特性,覆盖度和排他性,用于区分驱动途径,或包含驱动突变的基因组,与包含乘客突变的基因组。我们开发了两种称为 Dendrix 的算法,用于从体细胞突变数据中从头发现驱动途径。我们将 Dendrix 应用于分析来自 188 名肺腺癌患者的 623 个基因、84 名胶质母细胞瘤患者的 601 个基因以及 1000 名患有各种癌症的患者中的 238 个已知突变的体细胞突变数据。在所有数据集,我们发现了一组在大量患者中发生突变且其突变大致排他的基因。我们的 Dendrix 算法可扩展到对数千名患者的全基因组分析,因此将对来自癌症基因组图谱 (TCGA) 和其他大规模癌症基因组测序项目的更大数据集非常有用。