Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA.
BMC Genomics. 2010 Nov 2;11 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-11-S2-S5.
Protein destabilization is a common mechanism by which amino acid substitutions cause human diseases. Although several machine learning methods have been reported for predicting protein stability changes upon amino acid substitutions, the previous studies did not utilize relevant sequence features representing biological knowledge for classifier construction.
In this study, a new machine learning method has been developed for sequence feature-based prediction of protein stability changes upon amino acid substitutions. Support vector machines were trained with data from experimental studies on the free energy change of protein stability upon mutations. To construct accurate classifiers, twenty sequence features were examined for input vector encoding. It was shown that classifier performance varied significantly by using different sequence features. The most accurate classifier in this study was constructed using a combination of six sequence features. This classifier achieved an overall accuracy of 84.59% with 70.29% sensitivity and 90.98% specificity.
Relevant sequence features can be used to accurately predict protein stability changes upon amino acid substitutions. Predictive results at this level of accuracy may provide useful information to distinguish between deleterious and tolerant alterations in disease candidate genes. To make the classifier accessible to the genetics research community, we have developed a new web server, called MuStab (http://bioinfo.ggc.org/mustab/).
蛋白质的不稳定性是氨基酸取代导致人类疾病的常见机制。尽管已经有几种机器学习方法被报道用于预测氨基酸取代引起的蛋白质稳定性变化,但以前的研究没有利用代表生物学知识的相关序列特征来构建分类器。
在这项研究中,开发了一种新的基于序列特征的机器学习方法,用于预测氨基酸取代引起的蛋白质稳定性变化。支持向量机使用来自突变引起的蛋白质稳定性自由能变化的实验研究的数据进行训练。为了构建准确的分类器,对输入向量编码的二十个序列特征进行了检查。结果表明,使用不同的序列特征,分类器的性能差异显著。本研究中最准确的分类器是使用六个序列特征构建的。该分类器的总体准确性为 84.59%,灵敏度为 70.29%,特异性为 90.98%。
相关的序列特征可用于准确预测氨基酸取代引起的蛋白质稳定性变化。在这种精度水平的预测结果可能为区分疾病候选基因中的有害和耐受改变提供有用信息。为了使分类器能够被遗传学研究社区使用,我们开发了一个名为 MuStab(http://bioinfo.ggc.org/mustab/)的新的网络服务器。