Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology , Chapman University , One University Drive, Orange , California 92866 , United States.
Chapman University , School of Pharmacy , Irvine , California 92618 , United States.
J Chem Inf Model. 2018 Oct 22;58(10):2131-2150. doi: 10.1021/acs.jcim.8b00414. Epub 2018 Oct 3.
In this study, we developed two cancer-specific machine learning classifiers for prediction of driver mutations in cancer-associated genes that were validated on canonical data sets of functionally validated mutations and applied to a large cancer genomics data set. By examining sequence, structure, and ensemble-based integrated features, we have shown that evolutionary conservation scores play a critical role in classification of cancer drivers and provide the strongest signal in the machine learning prediction. Through extensive comparative analysis with structure-functional experiments and multicenter mutational calling data from Pan Cancer Atlas studies, we have demonstrated the robustness of our models and addressed the validity of computational predictions. To address the interpretability of cancer-specific classification models and obtain novel insights about molecular signatures of driver mutations, we have complemented machine learning predictions with structure-functional analysis of cancer driver mutations in several important oncogenes and tumor suppressor genes. By examining structural and dynamic signatures of known mutational hotspots and the predicted driver mutations, we have shown that the greater flexibility of specific functional regions targeted by driver mutations in oncogenes may facilitate activating conformational changes, while loss-of-function driver mutations in tumor suppressor genes can preferentially target structurally rigid positions that mediate allosteric communications in residue interaction networks and modulate protein binding interfaces. By revealing molecular signatures of cancer driver mutations, our results highlighted limitations of the binary driver/passenger classification, suggesting that functionally relevant cancer mutations may span a continuum spectrum of driverlike effects. Based on this analysis, we propose for experimental testing a group of novel potential driver mutations that can act by altering structure, global dynamics, and allosteric interaction networks in important cancer genes.
在这项研究中,我们开发了两种针对癌症相关基因中的驱动突变的癌症特异性机器学习分类器,这些分类器在功能验证的突变的规范数据集上进行了验证,并应用于大型癌症基因组学数据集。通过检查序列、结构和基于集合的综合特征,我们表明进化保守评分在癌症驱动因素的分类中起着关键作用,并为机器学习预测提供最强的信号。通过与结构-功能实验的广泛比较分析以及来自 Pan Cancer Atlas 研究的多中心突变调用数据,我们证明了我们模型的稳健性,并解决了计算预测的有效性。为了解决癌症特异性分类模型的可解释性,并获得有关驱动突变分子特征的新见解,我们通过对几个重要致癌基因和肿瘤抑制基因中的癌症驱动突变的结构-功能分析,补充了机器学习预测。通过检查已知突变热点和预测驱动突变的结构和动态特征,我们表明,致癌基因中驱动突变靶向的特定功能区域的更大灵活性可能有助于激活构象变化,而肿瘤抑制基因中的功能丧失驱动突变可以优先靶向介导残基相互作用网络中变构通讯并调节蛋白质结合界面的结构刚性位置。通过揭示癌症驱动突变的分子特征,我们的结果突出了二元驱动/乘客分类的局限性,表明功能相关的癌症突变可能跨越驱动效应的连续谱。基于此分析,我们提出了一组新的潜在驱动突变进行实验测试,这些突变可以通过改变重要癌症基因中的结构、全局动力学和变构相互作用网络来发挥作用。