School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.
Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Israel.
Nucleic Acids Res. 2019 Jul 26;47(13):6642-6655. doi: 10.1093/nar/gkz546.
Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit and explicit assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.
编制积极参与癌症的基因目录是一项持续的努力,对理解和治疗这种疾病具有深远的意义。已经开发了大量的计算方法,基于肿瘤体细胞突变的基因组数据筛选候选驱动基因。现有的方法对随机突变的分布做出了许多隐含和显式的假设。我们提出了 FABRIC,这是一种通过评估新生体突变对蛋白质编码基因的影响来量化癌症中基因选择的新框架。我们使用机器学习模型,量化了从超过 10000 个人类癌症样本中提取的约 300 万个体细胞突变的功能效应,并将其与编码人类基因组中所有可能的单核苷酸突变的效应进行了比较。我们检测到 593 个蛋白质编码基因显示出对有害突变的显著偏向。这些基因是在没有任何先验知识的情况下发现的,与已知的癌症基因有压倒性的重叠,但也包括许多被忽视的基因。 FABRIC 通过使用严格的统计学将每个基因与其自身的背景模型进行比较,从而避免了假发现,对随机体细胞突变的分布做出了最小的假设。该框架是一个开源项目,具有简单的命令行界面。