Solovyev V V, Salamov A A
Department of Cell Biology, Baylor College of Medicine, Houston, TX 77030, USA.
Comput Appl Biosci. 1994 Dec;10(6):661-9. doi: 10.1093/bioinformatics/10.6.661.
All current methods of protein secondary structure prediction are based on evaluation of a single residue state. Although the accuracy of the best of them is approximately 60-70%, for reliable prediction of tertiary structure it is more useful to predict an approximate location of alpha-helix and beta-strand segments, especially prolonged ones. We have developed a simple method for protein secondary structure prediction which is oriented on the location of secondary structure segments. The method uses linear discriminant analysis to assign segments of a given amino acid sequence a particular type of secondary structure, by taking into account the amino acid composition of internal parts of segments as well as their terminal and adjacent regions. Four linear discriminant functions were constructed for recognition of short and long alpha-helix and beta-strand segments respectively. These functions combine three characteristics: hydrophobic moment, segment singlet, and pair preferences to an alpha-helix or beta-strand. The last two characteristics are calculated by summing the preference parameters of single residues and pairs of residues located in a segment and its adjacent regions. The final program SSP predicts all possible potential alpha-helices and beta-strands and resolves some possible overlap between them. Overall three-state (alpha, beta, c) prediction gives approximately 65.1% correctly predicted residues on 126 non-homologous proteins using the jackknife test procedure. Analysis of the prediction results shows a high prediction accuracy of long secondary structure segments (approximately 89% of alpha-helices of length > 8 and approximately 71% of beta-strands of length > 6 are correctly located with probability of correct prediction 0.82 and 0.78 respectively.(ABSTRACT TRUNCATED AT 250 WORDS)
目前所有蛋白质二级结构预测方法均基于对单个残基状态的评估。尽管其中最佳方法的准确率约为60%-70%,但对于可靠预测三级结构而言,预测α螺旋和β链片段(尤其是延长片段)的大致位置更为有用。我们开发了一种简单的蛋白质二级结构预测方法,该方法侧重于二级结构片段的位置。该方法使用线性判别分析,通过考虑片段内部部分以及其末端和相邻区域的氨基酸组成,为给定氨基酸序列的片段分配特定类型的二级结构。分别构建了四个线性判别函数,用于识别短和长的α螺旋和β链片段。这些函数结合了三个特征:疏水矩、片段单峰以及对α螺旋或β链的配对偏好。后两个特征通过对位于片段及其相邻区域的单个残基和残基对的偏好参数求和来计算。最终程序SSP预测所有可能的潜在α螺旋和β链,并解决它们之间的一些可能重叠。总体而言,使用留一法检验程序,对126个非同源蛋白质进行三态(α、β、无规卷曲)预测时,正确预测的残基约为65.1%。对预测结果的分析表明,长二级结构片段的预测准确率很高(长度>8的α螺旋中约89%、长度>6的β链中约71%被正确定位,正确预测概率分别为0.82和0.78)。(摘要截断于250字)