Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.
Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, Pennsylvania, United States of America.
PLoS Comput Biol. 2021 Mar 12;17(3):e1008777. doi: 10.1371/journal.pcbi.1008777. eCollection 2021 Mar.
Cancer occurs via an accumulation of somatic genomic alterations in a process of clonal evolution. There has been intensive study of potential causal mutations driving cancer development and progression. However, much recent evidence suggests that tumor evolution is normally driven by a variety of mechanisms of somatic hypermutability, which act in different combinations or degrees in different cancers. These variations in mutability phenotypes are predictive of progression outcomes independent of the specific mutations they have produced to date. Here we explore the question of how and to what degree these differences in mutational phenotypes act in a cancer to predict its future progression. We develop a computational paradigm using evolutionary tree inference (tumor phylogeny) algorithms to derive features quantifying single-tumor mutational phenotypes, followed by a machine learning framework to identify key features predictive of progression. Analyses of breast invasive carcinoma and lung carcinoma demonstrate that a large fraction of the risk of future clinical outcomes of cancer progression-overall survival and disease-free survival-can be explained solely from mutational phenotype features derived from the phylogenetic analysis. We further show that mutational phenotypes have additional predictive power even after accounting for traditional clinical and driver gene-centric genomic predictors of progression. These results confirm the importance of mutational phenotypes in contributing to cancer progression risk and suggest strategies for enhancing the predictive power of conventional clinical data or driver-centric biomarkers.
癌症是通过克隆进化过程中体细胞基因组改变的积累而发生的。人们已经对潜在的致癌突变驱动癌症发生和进展的机制进行了深入研究。然而,最近的大量证据表明,肿瘤进化通常是由多种体细胞高突变性机制驱动的,这些机制在不同的癌症中以不同的组合或程度起作用。这些突变表型的可变性独立于它们迄今为止产生的特定突变,对进展结果具有预测性。在这里,我们探讨了这些突变表型的差异如何以及在多大程度上作用于癌症以预测其未来的进展。我们使用进化树推断(肿瘤系统发育)算法开发了一种计算范例,以推导出定量单肿瘤突变表型的特征,然后使用机器学习框架来识别预测进展的关键特征。对乳腺浸润性癌和肺癌的分析表明,癌症未来临床进展结果(总生存期和无病生存期)的风险很大一部分可以仅从系统发育分析中得出的突变表型特征来解释。我们进一步表明,即使考虑到传统的临床和驱动基因中心基因组预测因子的进展,突变表型仍具有额外的预测能力。这些结果证实了突变表型在导致癌症进展风险中的重要性,并提出了增强传统临床数据或驱动基因中心生物标志物预测能力的策略。