Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain.
Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Spain.
Nature. 2021 Aug;596(7872):428-432. doi: 10.1038/s41586-021-03771-1. Epub 2021 Jul 28.
Despite the existence of good catalogues of cancer genes, identifying the specific mutations of those genes that drive tumorigenesis across tumour types is still a largely unsolved problem. As a result, most mutations identified in cancer genes across tumours are of unknown significance to tumorigenesis. We propose that the mutations observed in thousands of tumours-natural experiments testing their oncogenic potential replicated across individuals and tissues-can be exploited to solve this problem. From these mutations, features that describe the mechanism of tumorigenesis of each cancer gene and tissue may be computed and used to build machine learning models that encapsulate these mechanisms. Here we demonstrate the feasibility of this solution by building and validating 185 gene-tissue-specific machine learning models that outperform experimental saturation mutagenesis in the identification of driver and passenger mutations. The models and their assessment of each mutation are designed to be interpretable, thus avoiding a black-box prediction device. Using these models, we outline the blueprints of potential driver mutations in cancer genes, and demonstrate the role of mutation probability in shaping the landscape of observed driver mutations. These blueprints will support the interpretation of newly sequenced tumours in patients and the study of the mechanisms of tumorigenesis of cancer genes across tissues.
尽管已经存在许多癌症基因目录,但确定驱动肿瘤发生的这些基因的特定突变仍然是一个尚未解决的大问题。因此,在癌症基因中鉴定出的大多数突变对肿瘤发生的意义尚不清楚。我们提出,在数千个肿瘤中观察到的突变——在个体和组织中重复测试其致癌潜力的自然实验——可以被利用来解决这个问题。从这些突变中,可以计算出描述每个癌症基因和组织的肿瘤发生机制的特征,并用于构建包含这些机制的机器学习模型。在这里,我们通过构建和验证 185 个基因-组织特异性机器学习模型来证明该解决方案的可行性,这些模型在识别驱动突变和乘客突变方面的表现优于实验饱和突变。这些模型及其对每个突变的评估旨在具有可解释性,从而避免了黑盒预测设备。使用这些模型,我们概述了癌症基因中潜在驱动突变的蓝图,并展示了突变概率在塑造观察到的驱动突变景观中的作用。这些蓝图将支持对患者中新测序肿瘤的解释,并研究跨组织的癌症基因的肿瘤发生机制。