基于序列预测未见的非同源蛋白质中突变诱导的稳定性变化。

Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins.

出版信息

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-15-S1-S4. Epub 2014 Jan 24.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4046685/

Abstract

BACKGROUND

Reliable prediction of stability changes induced by a single amino acid substitution is an important aspect of computational protein design. Several machine learning methods capable of predicting stability changes from the protein sequence alone have been introduced. Prediction performance of these methods is evaluated on mutations unseen during training. Nevertheless, different mutations of the same protein, and even the same residue, as encountered during training are commonly used for evaluation. We argue that a faithful evaluation can be achieved only when a method is tested on previously unseen proteins with low sequence similarity to the training set.

RESULTS

We provided experimental evidence of the limitations of the evaluation commonly used for assessing the prediction performance. Furthermore, we demonstrated that the prediction of stability changes in previously unseen non-homologous proteins is a challenging task for currently available methods. To improve the prediction performance of our previously proposed method, we identified features which led to over-fitting and further extended the model with new features. The new method employs Evolutionary And Structural Encodings with Amino Acid parameters (EASE-AA). Evaluated with an independent test set of more than 600 mutations, EASE-AA yielded a Matthews correlation coefficient of 0.36 and was able to classify correctly 66% of the stabilising and 74% of the destabilising mutations. For real-value prediction, EASE-AA achieved the correlation of predicted and experimentally measured stability changes of 0.51.

CONCLUSIONS

Commonly adopted evaluation with mutations in the same protein, and even the same residue, randomly divided between the training and test sets lead to an overestimation of prediction performance. Therefore, stability changes prediction methods should be evaluated only on mutations in previously unseen non-homologous proteins. Under such an evaluation, EASE-AA predicts stability changes more reliably than currently available methods.

摘要

背景

可靠地预测单个氨基酸取代引起的稳定性变化是计算蛋白质设计的一个重要方面。已经引入了几种能够仅从蛋白质序列预测稳定性变化的机器学习方法。这些方法的预测性能是在训练过程中未见过的突变体上进行评估的。然而，在训练过程中经常使用相同蛋白质的不同突变体，甚至相同的残基进行评估。我们认为，只有当方法在与训练集序列相似性低的先前未见的蛋白质上进行测试时，才能实现真实的评估。

结果

我们提供了实验证据，证明了通常用于评估预测性能的评估方法存在局限性。此外，我们证明了预测先前未见的非同源蛋白质的稳定性变化对于当前可用的方法来说是一项具有挑战性的任务。为了提高我们之前提出的方法的预测性能，我们确定了导致过拟合的特征，并进一步使用新特征扩展了模型。新方法采用了具有氨基酸参数的进化和结构编码（EASE-AA）。用超过 600 个突变的独立测试集进行评估，EASE-AA 得到了 0.36 的马修斯相关系数，能够正确分类 66%的稳定突变体和 74%的不稳定突变体。对于真实值预测，EASE-AA 实现了预测和实验测量的稳定性变化之间的相关性为 0.51。

结论

在训练集和测试集中随机划分同一蛋白质甚至同一残基的突变体进行评估的常用方法会导致预测性能的高估。因此，稳定性变化预测方法仅应在先前未见的非同源蛋白质的突变体上进行评估。在这种评估下，EASE-AA 比当前可用的方法更可靠地预测稳定性变化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da7e/4046685/66a0f9f6e28a/12864_2014_5677_Fig1_HTML.jpg

相似文献

Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins.

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-15-S1-S4. Epub 2014 Jan 24.

Feature-based multiple models improve classification of mutation-induced stability changes.

BMC Genomics. 2014;15 Suppl 4(Suppl 4):S6. doi: 10.1186/1471-2164-15-S4-S6. Epub 2014 May 20.

EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models.

J Mol Biol. 2016 Mar 27;428(6):1394-1405. doi: 10.1016/j.jmb.2016.01.012. Epub 2016 Jan 22.

Sequence-only evolutionary and predicted structural features for the prediction of stability changes in protein mutants.

BMC Bioinformatics. 2013;14 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-14-S2-S6. Epub 2013 Jan 21.

iStable: off-the-shelf predictor integration for predicting protein stability changes.

BMC Bioinformatics. 2013;14 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-14-S2-S5. Epub 2013 Jan 21.

Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.

BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.

Assessing the performance of computational predictors for estimating protein stability changes upon missense mutations.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab184.

Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site.

Bioinformatics. 2011 Dec 1;27(23):3286-92. doi: 10.1093/bioinformatics/btr576. Epub 2011 Oct 13.

Structure-based prediction of the effects of a missense variant on protein stability.

Amino Acids. 2013 Mar;44(3):847-55. doi: 10.1007/s00726-012-1407-7. Epub 2012 Oct 12.

DDGun: an untrained method for the prediction of protein stability changes upon single and multiple point variations.

BMC Bioinformatics. 2019 Jul 3;20(Suppl 14):335. doi: 10.1186/s12859-019-2923-1.

引用本文的文献

Prediction of protein stability changes upon single-point variant using 3D structure profile.

Comput Struct Biotechnol J. 2022 Dec 8;21:354-364. doi: 10.1016/j.csbj.2022.12.008. eCollection 2023.

SAAFEC-SEQ: A Sequence-Based Method for Predicting the Effect of Single Point Mutations on Protein Thermodynamic Stability.

Int J Mol Sci. 2021 Jan 9;22(2):606. doi: 10.3390/ijms22020606.

KStable: A Computational Method for Predicting Protein Thermal Stability Changes by K-Star with Regular-mRMR Feature Selection.

Entropy (Basel). 2018 Dec 19;20(12):988. doi: 10.3390/e20120988.

Feature-based multiple models improve classification of mutation-induced stability changes.

BMC Genomics. 2014;15 Suppl 4(Suppl 4):S6. doi: 10.1186/1471-2164-15-S4-S6. Epub 2014 May 20.

Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics.

Int J Mol Sci. 2014 May 30;15(6):9670-717. doi: 10.3390/ijms15069670.

本文引用的文献

DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels.

Genome Biol. 2013 Mar 13;14(3):R23. doi: 10.1186/gb-2013-14-3-r23.

Sequence-only evolutionary and predicted structural features for the prediction of stability changes in protein mutants.

BMC Bioinformatics. 2013;14 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-14-S2-S6. Epub 2013 Jan 21.

Structure-based prediction of the effects of a missense variant on protein stability.

Amino Acids. 2013 Mar;44(3):847-55. doi: 10.1007/s00726-012-1407-7. Epub 2012 Oct 12.

SPINE-D: accurate prediction of short and long disordered regions by a single neural-network based method.

J Biomol Struct Dyn. 2012;29(4):799-813. doi: 10.1080/073911012010525022.

SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles.

J Comput Chem. 2012 Jan 30;33(3):259-67. doi: 10.1002/jcc.21968. Epub 2011 Nov 2.

Sequence feature-based prediction of protein stability changes upon amino acid substitutions.

BMC Genomics. 2010 Nov 2;11 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-11-S2-S5.

Predicting changes in protein thermostability brought about by single- or multi-site mutations.

BMC Bioinformatics. 2010 Jul 2;11:370. doi: 10.1186/1471-2105-11-370.

A method and server for predicting damaging missense mutations.

Nat Methods. 2010 Apr;7(4):248-9. doi: 10.1038/nmeth0410-248.

Performance of protein stability predictors.

Hum Mutat. 2010 Jun;31(6):675-84. doi: 10.1002/humu.21242.

Machine learning integration for predicting the effect of single amino acid substitutions on protein stability.

BMC Struct Biol. 2009 Oct 19;9:66. doi: 10.1186/1472-6807-9-66.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于序列预测未见的非同源蛋白质中突变诱导的稳定性变化。

Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins.

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献