Folkman Lukas, Stantic Bela, Sattar Abdul, Zhou Yaoqi
Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia; Queensland Research Laboratory, NICTA, National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia.
Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia.
J Mol Biol. 2016 Mar 27;428(6):1394-1405. doi: 10.1016/j.jmb.2016.01.012. Epub 2016 Jan 22.
Protein engineering and characterisation of non-synonymous single nucleotide variants (SNVs) require accurate prediction of protein stability changes (ΔΔGu) induced by single amino acid substitutions. Here, we have developed a new prediction method called Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM), which comprises five specialised support vector machine (SVM) models and makes the final prediction from a consensus of two models selected based on the predicted secondary structure and accessible surface area of the mutated residue. The new method is applicable to single-domain monomeric proteins and can predict ΔΔGu with a protein sequence and mutation as the only inputs. EASE-MM yielded a Pearson correlation coefficient of 0.53-0.59 in 10-fold cross-validation and independent testing and was able to outperform other sequence-based methods. When compared to structure-based energy functions, EASE-MM achieved a comparable or better performance. The application to a large dataset of human germline non-synonymous SNVs showed that the disease-causing variants tend to be associated with larger magnitudes of ΔΔGu predicted with EASE-MM. The EASE-MM web-server is available at http://sparks-lab.org/server/ease.
蛋白质工程和非同义单核苷酸变异(SNV)的表征需要准确预测由单个氨基酸替换引起的蛋白质稳定性变化(ΔΔGu)。在此,我们开发了一种新的预测方法,称为具有多模型的进化、氨基酸和结构编码(EASE-MM),它由五个专门的支持向量机(SVM)模型组成,并根据预测的二级结构和突变残基的可及表面积从两个选定模型的共识中做出最终预测。这种新方法适用于单结构域单体蛋白,并且仅以蛋白质序列和突变作为唯一输入就能预测ΔΔGu。EASE-MM在10折交叉验证和独立测试中的皮尔逊相关系数为0.53 - 0.59,并且能够优于其他基于序列的方法。与基于结构的能量函数相比,EASE-MM表现相当或更佳。对大量人类种系非同义SNV数据集的应用表明,致病变异往往与EASE-MM预测的更大幅度的ΔΔGu相关。EASE-MM网络服务器可在http://sparks-lab.org/server/ease获取。