Zhang C T, Lin Z S, Zhang Z, Yan M
Department of Physics, Tianjin University, China.
Protein Eng. 1998 Nov;11(11):971-9. doi: 10.1093/protein/11.11.971.
An improved multiple linear regression method has been proposed to predict the content of alpha-helix and beta-strand of a globular protein based on its primary sequence. The amino acid composition and the auto-correlation functions based on the hydrophobicity profile of the primary sequence have been taken into account in the algorithm. The resubstitution test shows that the average absolute errors are 0.077 and 0.073 with the standard deviations 0.059 and 0.057 for the prediction of the content of alpha-helix and beta-strand, respectively. A stringent cross-validation test, i.e., the jackknife test, shows that the average absolute errors are 0.087 and 0.081 with the standard deviations 0.067 and 0.065 for the prediction of the content of alpha-helix and beta-strand, respectively. Both tests indicate the self-consistency and the extrapolating effectiveness of the new algorithm. This greatly improves on previous results (Eisenhaber,F., Imperiale,F., Argos,P. and Frommel,C., 1996, Proteins, 25, 157-168). Compared with other methods currently available, our method has the merits of simplicity and ease-of-use as well as a higher prediction accuracy. The only input of the method is the primary sequence of the query protein to be predicted. The program is available on request via e-mail: ctzhang@tju.edu.cn.
基于球状蛋白质的一级序列,提出了一种改进的多元线性回归方法来预测其α-螺旋和β-链的含量。该算法考虑了氨基酸组成以及基于一级序列疏水分布的自相关函数。留一法检验表明,预测α-螺旋和β-链含量时,平均绝对误差分别为0.077和0.073,标准差分别为0.059和0.057。严格的交叉验证检验,即刀切法检验表明,预测α-螺旋和β-链含量时,平均绝对误差分别为0.087和0.081,标准差分别为0.067和0.065。两种检验均表明新算法的自洽性和外推有效性。这大大改进了先前的结果(Eisenhaber,F., Imperiale,F., Argos,P.和Frommel,C., 1996年,《蛋白质》,25卷,第157 - 168页)。与目前可用的其他方法相比,我们的方法具有简单易用以及预测准确性更高的优点。该方法唯一的输入是待预测查询蛋白质的一级序列。可通过电子邮件ctzhang@tju.edu.cn索取该程序。