Song Jiangning, Yuan Zheng, Tan Hao, Huber Thomas, Burrage Kevin
Advanced Computational Modelling Centre, The University of Queensland, Brisbane, QLD 4072, Australia.
Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.
Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications.
We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects.
The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
二硫键是蛋白质中两个半胱氨酸残基之间的主要共价交联键,在稳定蛋白质结构中起着关键作用,常见于胞外或分泌蛋白中。在蛋白质折叠预测中,二硫键的定位可大大减少构象空间的搜索。因此,迫切需要开发能够准确预测蛋白质中二硫键连接模式的计算方法,这些方法可能具有潜在的重要应用。
我们开发了一种从蛋白质一级序列预测二硫键连接模式的新方法,使用基于多个序列特征向量的支持向量回归(SVR)方法,并通过PSIPRED程序预测二级结构。结果表明,在一个明确的非同源数据集上,使用4折交叉验证,对具有两到五个二硫键的蛋白质进行平均时,我们的方法在蛋白质和半胱氨酸对上的预测准确率分别达到74.4%和77.9%。我们评估了不同序列编码方案对二硫键连接预测性能的影响。结果表明,基于多个序列特征向量并结合预测二级结构的序列编码方案可以显著提高预测准确率,从而使我们的方法优于目前大多数其他可用的预测器。我们的工作为当前算法提供了一种补充方法,在计算分配二硫键连接模式方面应该是有用的,并有助于注释大规模全基因组项目产生的蛋白质序列。
预测网络服务器和补充材料可在http://foo.maths.uq.edu.au/~huber/disulfide获取。