Human Genetics Center, UTHealth School of Public Health, Houston, TX, USA.
Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
BMC Med Genomics. 2019 Jan 31;12(Suppl 1):22. doi: 10.1186/s12920-018-0452-9.
Identifying cancer driver genes (CDG) is a crucial step in cancer genomic toward the advancement of precision medicine. However, driver gene discovery is a very challenging task because we are not only dealing with huge amount of data; but we are also faced with the complexity of the disease including the heterogeneity of background somatic mutation rate in each cancer patient. It is generally accepted that CDG harbor variants conferring growth advantage in the malignant cell and they are positively selected, which are critical to cancer development; whereas, non-driver genes harbor random mutations with no functional consequence on cancer. Based on this fact, function prediction based approaches for identifying CDG have been proposed to interrogate the distribution of functional predictions among mutations in cancer genomes (eLS 1-16, 2016). Assuming most of the observed mutations are passenger mutations and given the quantitative predictions for the functional impact of the mutations, genes enriched of functional or deleterious mutations are more likely to be drivers. The promises of these methods have been continually refined and can therefore be applied to increase accuracy in detecting new candidate CDGs. However, current function prediction based approaches only focus on coding mutations and lack a systematic way to pick the best mutation deleteriousness prediction algorithms for usage.
In this study, we propose a new function prediction based approach to discover CDGs through a gene-based permutation approach. Our method not only covers both coding and non-coding regions of the genes; but it also accounts for the heterogeneous mutational context in cohort of cancer patients. The permutation model was implemented independently using seven popular deleteriousness prediction scores covering splicing regions (SPIDEX), coding regions (MetaLR, and VEST3) and pan-genome (CADD, DANN, Fathmm-MKL coding and Fathmm-MKL noncoding). We applied this new approach to somatic single nucleotide variants (SNVs) from whole-genome sequences of 119 breast and 24 lung cancer patients and compared the seven deleteriousness prediction scores for their performance in this study.
The new function prediction based approach not only predicted known cancer genes listed in the Cancer Gene Census (CGC), but also new candidate CDGs that are worth further investigation. The results showed the advantage of utilizing pan-genome deleteriousness prediction scores in function prediction based methods. Although VEST3 score, a deleteriousness prediction score for missense mutations, has the best performance in breast cancer, it was topped by CADD and Fathmm-MKL coding, two pan-genome deleteriousness prediction scores, in lung cancer.
鉴定癌症驱动基因(CDG)是癌症基因组学迈向精准医学的关键步骤。然而,驱动基因的发现是一项极具挑战性的任务,因为我们不仅要处理大量的数据,还要面对疾病的复杂性,包括每个癌症患者背景体细胞突变率的异质性。人们普遍认为,CDG 中含有赋予恶性细胞生长优势的变异,这些变异是正选择的,对癌症的发展至关重要;而非驱动基因则含有没有功能后果的随机突变。基于这一事实,人们提出了基于功能预测的方法来研究癌症基因组中突变的功能预测分布(eLS 1-16,2016)。假设大多数观察到的突变都是乘客突变,并且考虑到突变的功能影响的定量预测,富含功能或有害突变的基因更有可能是驱动基因。这些方法的前景一直在不断完善,因此可以应用于提高检测新候选 CDG 的准确性。然而,目前基于功能预测的方法仅关注编码突变,缺乏系统的方法来选择最佳的突变有害性预测算法。
在这项研究中,我们提出了一种新的基于功能预测的方法,通过基于基因的排列方法来发现 CDG。我们的方法不仅涵盖了基因的编码和非编码区域,还考虑了癌症患者队列中异质性的突变背景。排列模型使用七种流行的有害性预测评分独立实现,涵盖剪接区域(SPIDEX)、编码区域(MetaLR 和 VEST3)和泛基因组(CADD、DANN、Fathmm-MKL 编码和 Fathmm-MKL 非编码)。我们将这种新方法应用于 119 例乳腺癌和 24 例肺癌患者全基因组序列中的体细胞单核苷酸变异(SNV),并比较了这七种有害性预测评分在本研究中的性能。
新的基于功能预测的方法不仅预测了癌症基因名录(CGC)中列出的已知癌症基因,还预测了值得进一步研究的新候选 CDG。结果表明,在基于功能预测的方法中利用泛基因组有害性预测评分具有优势。虽然错义突变有害性预测评分 VEST3 在乳腺癌中的表现最好,但在肺癌中,它被泛基因组有害性预测评分 CADD 和 Fathmm-MKL 编码所超越。