Suppr超能文献

结合GOR V算法与进化信息从氨基酸序列预测蛋白质二级结构。

Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence.

作者信息

Kloczkowski A, Ting K-L, Jernigan R L, Garnier J

机构信息

Laboratory of Experimental and Computational Biology, NCI, NIH, Bethesda, Maryland, USA.

出版信息

Proteins. 2002 Nov 1;49(2):154-66. doi: 10.1002/prot.10181.

Abstract

We have modified and improved the GOR algorithm for the protein secondary structure prediction by using the evolutionary information provided by multiple sequence alignments, adding triplet statistics, and optimizing various parameters. We have expanded the database used to include the 513 non-redundant domains collected recently by Cuff and Barton (Proteins 1999;34:508-519; Proteins 2000;40:502-511). We have introduced a variable size window that allowed us to include sequences as short as 20-30 residues. A significant improvement over the previous versions of GOR algorithm was obtained by combining the PSI-BLAST multiple sequence alignments with the GOR method. The new algorithm will form the basis for the future GOR V release on an online prediction server. The average accuracy of the prediction of secondary structure with multiple sequence alignment and full jack-knife procedure was 73.5%. The accuracy of the prediction increases to 74.2% by limiting the prediction to 375 (of 513) sequences having at least 50 PSI-BLAST alignments. The average accuracy of the prediction of the new improved program without using multiple sequence alignments was 67.5%. This is approximately a 3% improvement over the preceding GOR IV algorithm (Garnier J, Gibrat JF, Robson B. Methods Enzymol 1996;266:540-553; Kloczkowski A, Ting K-L, Jernigan RL, Garnier J. Polymer 2002;43:441-449). We have discussed alternatives to the segment overlap (Sov) coefficient proposed by Zemla et al. (Proteins 1999;34:220-223).

摘要

我们通过使用多序列比对提供的进化信息、添加三联体统计数据以及优化各种参数,对用于蛋白质二级结构预测的GOR算法进行了修改和改进。我们扩展了所使用的数据库,纳入了Cuff和Barton最近收集的513个非冗余结构域(《蛋白质》,1999年;34:508 - 519;《蛋白质》,2000年;40:502 - 511)。我们引入了可变大小的窗口,使我们能够纳入短至20 - 30个残基的序列。通过将PSI - BLAST多序列比对与GOR方法相结合,相对于GOR算法的先前版本有了显著改进。新算法将成为未来在线预测服务器上GOR V版本的基础。采用多序列比对和完全留一法程序对二级结构进行预测的平均准确率为73.5%。通过将预测限制在(513个中)至少有50个PSI - BLAST比对的375个序列上,预测准确率提高到了74.2%。不使用多序列比对的新改进程序的预测平均准确率为67.5%。这比之前的GOR IV算法(Garnier J, Gibrat JF, Robson B. Methods Enzymol 1996;266:540 - 553; Kloczkowski A, Ting K - L, Jernigan RL, Garnier J. Polymer 2002;43:441 - 449)大约提高了3%。我们还讨论了Zemla等人(《蛋白质》,1999年;34:220 - 223)提出的片段重叠(Sov)系数的替代方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验