Department of Biochemistry and Molecular Biology, Tel-Aviv University, Ramat Aviv 69978, Israel.
Bioinformatics. 2011 Dec 1;27(23):3286-92. doi: 10.1093/bioinformatics/btr576. Epub 2011 Oct 13.
Accurate prediction of protein stability is important for understanding the molecular underpinnings of diseases and for the design of new proteins. We introduce a novel approach for the prediction of changes in protein stability that arise from a single-site amino acid substitution; the approach uses available data on mutations occurring in the same position and in other positions. Our algorithm, named Pro-Maya (Protein Mutant stAbilitY Analyzer), combines a collaborative filtering baseline model, Random Forests regression and a diverse set of features. Pro-Maya predicts the stability free energy difference of mutant versus wild type, denoted as ΔΔG.
We evaluated our algorithm extensively using cross-validation on two previously utilized datasets of single amino acid mutations and a (third) validation set. The results indicate that using known ΔΔG values of mutations at the query position improves the accuracy of ΔΔG predictions for other mutations in that position. The accuracy of our predictions in such cases significantly surpasses that of similar methods, achieving, e.g. a Pearson's correlation coefficient of 0.79 and a root mean square error of 0.96 on the validation set. Because Pro-Maya uses a diverse set of features, including predictions using two other methods, it also performs slightly better than other methods in the absence of additional experimental data on the query positions.
Pro-Maya is freely available via web server at http://bental.tau.ac.il/ProMaya.
nirb@tauex.tau.ac.il; wolf@cs.tau.ac.il
Supplementary data are available at Bioinformatics online.
准确预测蛋白质稳定性对于理解疾病的分子基础以及设计新蛋白质至关重要。我们引入了一种新的方法来预测单个氨基酸取代引起的蛋白质稳定性变化;该方法使用了在相同位置和其他位置发生的突变的可用数据。我们的算法名为 Pro-Maya(蛋白质突变稳定性分析器),它结合了协同过滤基线模型、随机森林回归和一组多样化的特征。Pro-Maya 预测突变体与野生型的稳定性自由能差异,记为 ΔΔG。
我们在两个先前使用的单氨基酸突变数据集和一个(第三个)验证集上进行了广泛的交叉验证,评估了我们的算法。结果表明,使用查询位置处已知的突变 ΔΔG 值可以提高该位置其他突变的 ΔΔG 预测准确性。在这种情况下,我们的预测准确性明显优于类似的方法,例如在验证集上达到了 0.79 的 Pearson 相关系数和 0.96 的均方根误差。由于 Pro-Maya 使用了一组多样化的特征,包括使用另外两种方法的预测,因此在没有查询位置的额外实验数据的情况下,它的性能也略优于其他方法。
Pro-Maya 可通过 web 服务器免费使用,网址为 http://bental.tau.ac.il/ProMaya。
nirb@tauex.tau.ac.il;wolf@cs.tau.ac.il
补充数据可在 Bioinformatics 在线获取。