Yuan Zheng, Huang Bixing
Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, St. Lucia, Australia.
Proteins. 2004 Nov 15;57(3):558-64. doi: 10.1002/prot.20234.
A novel support vector regression (SVR) approach is proposed to predict protein accessible surface areas (ASAs) from their primary structures. In this work, we predict the real values of ASA in squared angstroms for residues instead of relative solvent accessibility. Based on protein residues, the mean and median absolute errors are 26.0 A(2) and 18.87 A(2), respectively. The correlation coefficient between the predicted and observed ASAs is 0.66. Cysteine is the best predicted amino acid (mean absolute error is 13.8 A(2) and median absolute error is 8.37 A(2)), while arginine is the least predicted amino acid (mean absolute error is 42.7 A(2) and median absolute error is 36.31 A(2)). Our work suggests that the SVR approach can be directly applied to the ASA prediction where data preclassification has been used.
提出了一种新颖的支持向量回归(SVR)方法,用于从蛋白质的一级结构预测其可及表面积(ASA)。在这项工作中,我们预测的是残基的以平方埃为单位的ASA实际值,而非相对溶剂可及性。基于蛋白质残基,平均绝对误差和中位数绝对误差分别为26.0 Ų和18.87 Ų。预测的和观察到的ASA之间的相关系数为0.66。半胱氨酸是预测效果最好的氨基酸(平均绝对误差为13.8 Ų,中位数绝对误差为8.37 Ų),而精氨酸是预测效果最差的氨基酸(平均绝对误差为42.7 Ų,中位数绝对误差为36.31 Ų)。我们的工作表明,SVR方法可直接应用于已使用数据预分类的ASA预测。