Przybylski Dariusz, Rost Burkhard
Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
Proteins. 2002 Feb 1;46(2):197-205. doi: 10.1002/prot.10029.
Using information from sequence alignments significantly improves protein secondary structure prediction. Typically, more divergent profiles yield better predictions. Recently, various groups have shown that accuracy can be improved significantly by using PSI-BLAST profiles to develop new prediction methods. Here, we focused on the influences of various alignment strategies on two 8-year-old PHD methods. The following results stood out. (i) PHD using pairwise alignments predicts about 72% of all residues correctly in one of the three states: helix, strand, and other. Using larger databases and PSI-BLAST raised accuracy to 75%. (ii) More than 60% of the improvement originated from the growth of current sequence databases; about 20% resulted from detailed changes in the alignment procedure (substitution matrix, thresholds, and gap penalties). Another 20% of the improvement resulted from carefully using iterated PSI-BLAST searches. (iii) It is of interest that we failed to improve prediction accuracy further when attempting to refine the alignment by dynamic programming (MaxHom and ClustalW). (iv) Improvement through family growth appears to saturate at some point. However, most families have not reached this saturation. Hence, we anticipate that prediction accuracy will continue to rise with database growth.
使用序列比对信息能显著提高蛋白质二级结构预测的准确性。通常,差异越大的比对结果能带来更好的预测。最近,多个研究团队表明,通过使用PSI-BLAST比对结果来开发新的预测方法,预测准确性可得到显著提高。在此,我们聚焦于不同比对策略对两种已有8年历史的PHD方法的影响。以下结果较为突出:(i)使用两两比对的PHD方法能在三种状态(螺旋、链和其他)之一中正确预测约72%的所有残基。使用更大的数据库和PSI-BLAST可将准确性提高到75%。(ii)超过60%的准确性提升源自当前序列数据库的增长;约20%来自比对过程中的细节变化(替换矩阵、阈值和空位罚分)。另外20%的提升来自谨慎使用迭代PSI-BLAST搜索。(iii)有意思的是,当尝试通过动态规划(MaxHom和ClustalW)优化比对时,我们未能进一步提高预测准确性。(iv)通过家族增长实现的准确性提升似乎在某个点达到饱和。然而,大多数家族尚未达到这种饱和状态。因此,我们预计随着数据库的增长,预测准确性将继续提高。