Zhang C T, Zhang Z, He Z
Department of Physics, Tianjin University, China.
J Protein Chem. 1998 Apr;17(3):261-72. doi: 10.1023/a:1022588803017.
The prediction of the secondary structural contents (those of alpha-helix and beta-strand) of a globular protein is of great use in the prediction of protein structure. In this paper, a new prediction algorithm has been proposed based on Chou's database [Chou (1995), Proteins 21, 319-344]. The new algorithm is an improved multiple linear regression method, taking into account the nonlinear and coupling terms of the frequencies of different amino acids and the length of the protein. The prediction is also based on the structural classes of proteins, but instead of four classes, only three classes are considered, the alpha class, beta class, and the mixed alpha+beta and alpha/beta class or simply the alphabeta class. Thus the ambiguity that usually occurs between alpha+beta proteins and alpha/beta proteins is eliminated. A resubstitution examination for the algorithm shows that the average absolute errors are 0.040 and 0.035 for the prediction of alpha-helix content and beta-strand content, respectively. An examination of cross-validation, the jackknife analysis, shows that the average absolute errors are 0.051 and 0.045 for the prediction of alpha-helix content and beta-strand content, respectively. Both examinations indicate the self-consistency and the extrapolating effectiveness of the new algorithm. Compared with other methods, ours has the merits of simplicity and convenience for use, as well as high prediction accuracy. By incorporating the prediction of the structural classes, the only input of our method is the amino acid composition and the length of the protein to be predicted.
预测球状蛋白质的二级结构含量(α螺旋和β链的含量)对于蛋白质结构预测非常有用。本文基于周(Chou)的数据库[Chou(1995),《蛋白质》21,319 - 344]提出了一种新的预测算法。新算法是一种改进的多元线性回归方法,考虑了不同氨基酸频率的非线性和耦合项以及蛋白质的长度。预测也是基于蛋白质的结构类别,但不是考虑四类,而是只考虑三类,即α类、β类以及混合的α + β和α/β类(或简称为αβ类)。这样就消除了通常在α + β蛋白质和α/β蛋白质之间出现的模糊性。对该算法的重新代入检验表明,预测α螺旋含量和β链含量的平均绝对误差分别为0.040和0.035。交叉验证检验,即留一法分析表明,预测α螺旋含量和β链含量的平均绝对误差分别为0.051和0.045。这两种检验都表明了新算法的自洽性和外推有效性。与其他方法相比,我们的方法具有使用简单方便以及预测准确性高的优点。通过纳入结构类别的预测,我们方法唯一的输入是待预测蛋白质的氨基酸组成和长度。