Huang Liang-Tsung, Gromiha M Michael
Department of Computer Science and Information Engineering, Ming-Dao University, Changhua 523, Taiwan.
J Comput Chem. 2008 Jul 30;29(10):1675-83. doi: 10.1002/jcc.20925.
Understanding the relationship between amino acid sequences and folding rates of proteins is an important task in computational and molecular biology. In this work, we have systematically analyzed the composition of amino acid residues for proteins with different ranges of folding rates. We observed that the polar residues, Asn, Gln, Ser, and Lys, are dominant in fast folding proteins whereas the hydrophobic residues, Ala, Cys, Gly, and Leu, prefer to be in slow folding proteins. Further, we have developed a method based on quadratic response surface models for predicting the folding rates of 77 two- and three-state proteins. Our method showed a correlation of 0.90 between experimental and predicted protein folding rates using leave-one-out cross-validation method. The classification of proteins based on structural class improved the correlation to 0.98 and it is 0.99, 0.98, and 0.96, respectively, for all-alpha, all-beta, and mixed class proteins. In addition, we have utilized Baysean classification theory for discriminating two- and three-state proteins, which showed an accuracy of 90%. We have developed a web server for predicting protein folding rates and it is available at http://bioinformatics.myweb.hinet.net/foldrate.htm.
理解氨基酸序列与蛋白质折叠速率之间的关系是计算生物学和分子生物学中的一项重要任务。在这项工作中,我们系统地分析了不同折叠速率范围的蛋白质的氨基酸残基组成。我们观察到,极性残基天冬酰胺(Asn)、谷氨酰胺(Gln)、丝氨酸(Ser)和赖氨酸(Lys)在快速折叠的蛋白质中占主导地位,而疏水残基丙氨酸(Ala)、半胱氨酸(Cys)、甘氨酸(Gly)和亮氨酸(Leu)则更倾向于存在于缓慢折叠的蛋白质中。此外,我们开发了一种基于二次响应面模型的方法来预测77种二态和三态蛋白质的折叠速率。使用留一法交叉验证方法,我们的方法显示实验和预测的蛋白质折叠速率之间的相关性为0.90。基于结构类别的蛋白质分类将相关性提高到了0.98,对于全α类、全β类和混合类蛋白质,相关性分别为0.99、0.98和0.96。此外,我们利用贝叶斯分类理论来区分二态和三态蛋白质,其准确率为90%。我们开发了一个用于预测蛋白质折叠速率的网络服务器,可在http://bioinformatics.myweb.hinet.net/foldrate.htm上获取。