将结构建模与集成机器学习相结合，以准确预测蛋白质折叠稳定性以及突变对结合亲和力的影响。

Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation.

作者信息

Berliner Niklas, Teyra Joan, Colak Recep, Garcia Lopez Sebastian, Kim Philip M

机构信息

Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada.

Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, Ontario, Canada; Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.

出版信息

PLoS One. 2014 Sep 22;9(9):e107353. doi: 10.1371/journal.pone.0107353. eCollection 2014.

DOI:10.1371/journal.pone.0107353

PMID:25243403

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4170975/

Abstract

Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases.

摘要

测序技术的进步导致突变迅速积累，其中一些与疾病相关。然而，要得出机制性结论，有必要对这些突变进行生化理解。对于编码突变，需要准确预测蛋白质稳定性或其与结合伴侣亲和力的显著变化。传统方法使用半经验力场，而较新的方法则采用序列和结构特征的机器学习。在这里，我们展示了如何将这两种方法结合起来显著提高准确性。我们引入了ELASPIC，这是一种新颖的集成机器学习方法，能够预测结构域核心和结构域-结构域界面中突变对稳定性的影响。我们将半经验能量项、序列保守性和各种分子细节与决策树随机梯度提升（SGB-DT）算法相结合。我们预测的准确性大大超过现有方法，稳定性预测的相关系数达到0.77，亲和力预测的相关系数达到0.75。值得注意的是，我们整合了同源建模以实现全蛋白质组预测，并表明对建模结构进行准确预测是可能的。最后，ELASPIC显示出不同类型疾病相关突变之间以及疾病与常见中性突变之间存在显著差异。与试图预测突变表型效应的纯基于序列的预测方法不同，我们的预测揭示了控制蛋白质不稳定性的分子细节，并帮助我们更好地理解疾病的分子原因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6d/4170975/182617ed4dcd/pone.0107353.g001.jpg

相似文献

Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation.

PLoS One. 2014 Sep 22;9(9):e107353. doi: 10.1371/journal.pone.0107353. eCollection 2014.

ELASPIC web-server: proteome-wide structure-based prediction of mutation effects on protein stability and binding affinity.

Bioinformatics. 2016 May 15;32(10):1589-91. doi: 10.1093/bioinformatics/btw031. Epub 2016 Jan 21.

Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning.

Protein Sci. 2020 Jan;29(1):247-257. doi: 10.1002/pro.3774. Epub 2019 Nov 25.

ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations.

J Mol Biol. 2021 May 28;433(11):166810. doi: 10.1016/j.jmb.2021.166810. Epub 2021 Jan 13.

Predicting the Effect of Mutations on Protein Folding and Protein-Protein Interactions.

Methods Mol Biol. 2019;1851:1-17. doi: 10.1007/978-1-4939-8736-8_1.

Computational tools help improve protein stability but with a solubility tradeoff.

J Biol Chem. 2017 Sep 1;292(35):14349-14361. doi: 10.1074/jbc.M117.784165. Epub 2017 Jul 14.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Computational Modeling of Protein Stability: Quantitative Analysis Reveals Solutions to Pervasive Problems.

Structure. 2020 Jun 2;28(6):717-726.e3. doi: 10.1016/j.str.2020.04.003. Epub 2020 May 5.

Predicting changes in protein stability caused by mutation using sequence-and structure-based methods in a CAGI5 blind challenge.

Hum Mutat. 2019 Sep;40(9):1414-1423. doi: 10.1002/humu.23852. Epub 2019 Aug 7.

APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.

BMC Bioinformatics. 2010 Apr 8;11:174. doi: 10.1186/1471-2105-11-174.

引用本文的文献

Graph masked self-distillation learning for prediction of mutation impact on protein-protein interactions.

Commun Biol. 2024 Oct 26;7(1):1400. doi: 10.1038/s42003-024-07066-9.

AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network.

Interdiscip Sci. 2025 Mar;17(1):101-113. doi: 10.1007/s12539-024-00662-7. Epub 2024 Oct 5.

PLoS One. 2023 Oct 26;18(10):e0293606. doi: 10.1371/journal.pone.0293606. eCollection 2023.

Incorporating physics to overcome data scarcity in predictive modeling of protein function: A case study of BK channels.

PLoS Comput Biol. 2023 Sep 15;19(9):e1011460. doi: 10.1371/journal.pcbi.1011460. eCollection 2023 Sep.

Topological deep learning based deep mutational scanning.

Comput Biol Med. 2023 Sep;164:107258. doi: 10.1016/j.compbiomed.2023.107258. Epub 2023 Jul 17.

Incorporating physics to overcome data scarcity in predictive modeling of protein function: a case study of BK channels.

bioRxiv. 2023 Jun 26:2023.06.24.546384. doi: 10.1101/2023.06.24.546384.

Characterization of RNA polymerase II trigger loop mutations using molecular dynamics simulations and machine learning.

PLoS Comput Biol. 2023 Mar 22;19(3):e1010999. doi: 10.1371/journal.pcbi.1010999. eCollection 2023 Mar.

Beyond sequence: Structure-based machine learning.

Comput Struct Biotechnol J. 2022 Dec 29;21:630-643. doi: 10.1016/j.csbj.2022.12.039. eCollection 2023.

Computational approaches for predicting variant impact: An overview from resources, principles to applications.

Front Genet. 2022 Sep 29;13:981005. doi: 10.3389/fgene.2022.981005. eCollection 2022.

Genome interpretation using in silico predictors of variant impact.

Hum Genet. 2022 Oct;141(10):1549-1577. doi: 10.1007/s00439-022-02457-6. Epub 2022 Apr 30.

本文引用的文献

Characterizing changes in the rate of protein-protein dissociation upon interface mutation using hotspot energy and organization.

PLoS Comput Biol. 2013;9(9):e1003216. doi: 10.1371/journal.pcbi.1003216. Epub 2013 Sep 5.

Community-wide evaluation of methods for predicting the effect of mutations on protein-protein interactions.

Proteins. 2013 Nov;81(11):1980-7. doi: 10.1002/prot.24356. Epub 2013 Aug 23.

Cancer missense mutations alter binding properties of proteins and their interaction networks.

PLoS One. 2013 Jun 14;8(6):e66273. doi: 10.1371/journal.pone.0066273. Print 2013.

BeAtMuSiC: Prediction of changes in protein-protein binding affinity on mutations.

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W333-9. doi: 10.1093/nar/gkt450. Epub 2013 May 30.

Distinct types of disorder in the human proteome: functional implications for alternative splicing.

PLoS Comput Biol. 2013 Apr;9(4):e1003030. doi: 10.1371/journal.pcbi.1003030. Epub 2013 Apr 25.

Interactome3D: adding structural details to protein networks.

Nat Methods. 2013 Jan;10(1):47-53. doi: 10.1038/nmeth.2289. Epub 2012 Dec 16.

The BioGRID interaction database: 2013 update.

Nucleic Acids Res. 2013 Jan;41(Database issue):D816-23. doi: 10.1093/nar/gks1158. Epub 2012 Nov 30.

Update on activities at the Universal Protein Resource (UniProt) in 2013.

Nucleic Acids Res. 2013 Jan;41(Database issue):D43-7. doi: 10.1093/nar/gks1068. Epub 2012 Nov 17.

An integrated map of genetic variation from 1,092 human genomes.

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

Versatility and invariance in the evolution of homologous heteromeric interfaces.

PLoS Comput Biol. 2012;8(8):e1002677. doi: 10.1371/journal.pcbi.1002677. Epub 2012 Aug 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

将结构建模与集成机器学习相结合，以准确预测蛋白质折叠稳定性以及突变对结合亲和力的影响。

Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献