Suppr超能文献

利用结构比对提高蛋白质二级结构预测的准确性。

Improving the accuracy of protein secondary structure prediction using structural alignment.

作者信息

Montgomerie Scott, Sundararaj Shan, Gallin Warren J, Wishart David S

机构信息

Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada.

出版信息

BMC Bioinformatics. 2006 Jun 14;7:301. doi: 10.1186/1471-2105-7-301.

Abstract

BACKGROUND

The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high.

RESULTS

We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4-5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%.

CONCLUSION

By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally.

摘要

背景

在过去30年里,蛋白质二级结构预测的准确性稳步提高。现在,许多二级结构预测方法通常能达到约75%的准确率(Q3)。我们认为,通过将结构(而非序列)数据库比较纳入预测过程,这一准确率有望进一步提高。事实上,鉴于蛋白质数据库规模庞大(超过35000个序列),新鉴定序列具有结构同源物的可能性实际上相当高。

结果

我们开发了一种方法,将基于结构的序列比对作为二级结构预测过程的一部分。通过将已知同源物(序列相似度>25%)的结构映射到查询蛋白质的序列上,可以预测该查询蛋白质二级结构的至少一部分。通过将这种结构比对方法与传统的(基于序列的)二级结构方法相结合,然后与“专家评审团”系统相结合以生成一致结果,能够获得非常高的预测准确率。使用来自EVA的1644个蛋白质的序列唯一测试集,这种新方法的平均Q3得分为81.3%。广泛测试表明,这比目前任何其他方法大约高出4 - 5%。使用非序列唯一测试集(蛋白质组注释或结构基因组学中使用的典型测试集)进行的评估表明,这种新方法可以实现接近88%的Q3得分。

结论

通过同时使用序列和结构数据库,并利用机器学习的最新技术,能够常规地以远高于80%的准确率预测蛋白质二级结构。一个名为PROTEUS的执行这些二级结构预测的程序和网络服务器可在http://wishart.biology.ualberta.ca/proteus上访问。对于高通量或批量序列分析,可以下载PROTEUS程序、数据库(和服务器)并在本地运行。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fdd/1550433/3a4a65608503/1471-2105-7-301-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验