Graduate Group in Biochemistry and Molecular Biophysics, University of Pennsylvania, Philadelphia, PA, USA.
Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA.
Cell Mol Life Sci. 2019 Jul;76(14):2663-2679. doi: 10.1007/s00018-019-03097-2. Epub 2019 Apr 13.
Methods to catalog and computationally assess the mutational landscape of proteins in human cancers are desirable. One approach is to adapt evolutionary or data-driven methods developed for predicting whether a single-nucleotide polymorphism (SNP) is deleterious to protein structure and function. In cases where understanding the mechanism of protein activation and regulation is desired, an alternative approach is to employ structure-based computational approaches to predict the effects of point mutations. Through a case study of mutations in kinase domains of three proteins, namely, the anaplastic lymphoma kinase (ALK) in pediatric neuroblastoma patients, serine/threonine-protein kinase B-Raf (BRAF) in melanoma patients, and erythroblastic oncogene B 2 (ErbB2 or HER2) in breast cancer patients, we compare the two approaches above. We find that the structure-based method is most appropriate for developing a binary classification of several different mutations, especially infrequently occurring ones, concerning the activation status of the given target protein. This approach is especially useful if the effects of mutations on the interactions of inhibitors with the target proteins are being sought. However, many patients will present with mutations spread across different target proteins, making structure-based models computationally demanding to implement and execute. In this situation, data-driven methods-including those based on machine learning techniques and evolutionary methods-are most appropriate for recognizing and illuminate mutational patterns. We show, however, that, in the present status of the field, the two methods have very different accuracies and confidence values, and hence, the optimal choice of their deployment is context-dependent.
需要有方法来对人类癌症中蛋白质的突变景观进行编目和计算评估。一种方法是采用为预测单核苷酸多态性 (SNP) 是否对蛋白质结构和功能有害而开发的进化或数据驱动方法。在希望了解蛋白质激活和调节机制的情况下,另一种方法是采用基于结构的计算方法来预测点突变的影响。通过对三种蛋白质的激酶结构域中的突变进行案例研究,即儿科神经母细胞瘤患者中的间变性淋巴瘤激酶 (ALK)、黑色素瘤患者中的丝氨酸/苏氨酸蛋白激酶 B-Raf (BRAF) 和乳腺癌患者中的红细胞生成性原癌基因 B2 (ErbB2 或 HER2),我们比较了上述两种方法。我们发现,基于结构的方法最适合于对给定靶蛋白的激活状态的几种不同突变(特别是罕见发生的突变)进行二进制分类。如果正在寻求突变对抑制剂与靶蛋白相互作用的影响,这种方法尤其有用。然而,许多患者会出现不同靶蛋白之间的突变,这使得基于结构的模型在计算上难以实现和执行。在这种情况下,数据驱动方法,包括基于机器学习技术和进化方法的方法,最适合于识别和阐明突变模式。然而,我们表明,在该领域的当前状态下,这两种方法的准确性和置信值有很大差异,因此,它们的最佳部署选择取决于上下文。