National Center for Biotechnology Information, NLM, NIH, Bethesda, MD, United States of America.
School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China.
PLoS Comput Biol. 2019 Apr 29;15(4):e1006981. doi: 10.1371/journal.pcbi.1006981. eCollection 2019 Apr.
Identifying driver mutations in cancer is notoriously difficult. To date, recurrence of a mutation in patients remains one of the most reliable markers of mutation driver status. However, some mutations are more likely to occur than others due to differences in background mutation rates arising from various forms of infidelity of DNA replication and repair machinery, endogenous, and exogenous mutagens. We calculated nucleotide and codon mutability to study the contribution of background processes in shaping the observed mutational spectrum in cancer. We developed and tested probabilistic pan-cancer and cancer-specific models that adjust the number of mutation recurrences in patients by background mutability in order to find mutations which may be under selection in cancer. We showed that mutations with higher mutability values had higher observed recurrence frequency, especially in tumor suppressor genes. This trend was prominent for nonsense and silent mutations or mutations with neutral functional impact. In oncogenes, however, highly recurring mutations were characterized by relatively low mutability, resulting in an inversed U-shaped trend. Mutations not yet observed in any tumor had relatively low mutability values, indicating that background mutability might limit mutation occurrence. We compiled a dataset of missense mutations from 58 genes with experimentally validated functional and transforming impacts from various studies. We found that mutability of driver mutations was lower than that of passengers and consequently adjusting mutation recurrence frequency by mutability significantly improved ranking of mutations and driver mutation prediction. Even though no training on existing data was involved, our approach performed similarly or better to the state-of-the-art methods.
鉴定癌症中的驱动突变是非常困难的。迄今为止,突变在患者中的复发仍然是突变驱动状态的最可靠标志物之一。然而,由于 DNA 复制和修复机制、内源性和外源性诱变剂的各种形式的不忠导致的背景突变率的差异,有些突变比其他突变更有可能发生。我们计算了核苷酸和密码子的易变性,以研究背景过程对癌症中观察到的突变谱的形成的贡献。我们开发并测试了概率泛癌和癌症特异性模型,这些模型通过背景易变性来调整患者中突变的复发次数,以找到可能在癌症中受到选择的突变。我们表明,具有较高易变性值的突变具有更高的观察到的复发频率,尤其是在肿瘤抑制基因中。这种趋势在无义突变和沉默突变或具有中性功能影响的突变中尤为明显。然而,在致癌基因中,高复发突变的特征是相对较低的易变性,导致呈倒 U 形趋势。尚未在任何肿瘤中观察到的突变具有相对较低的易变性值,这表明背景易变性可能限制了突变的发生。我们从各种研究中编译了一个包含 58 个具有实验验证的功能和转化影响的错义突变的数据集。我们发现,驱动突变的易变性低于乘客突变的易变性,因此通过易变性调整突变的复发频率显著提高了突变和驱动突变预测的排序。即使没有涉及现有数据的训练,我们的方法的性能与最先进的方法相当或更好。