利用结构比对提高蛋白质二级结构预测的准确性。

Improving the accuracy of protein secondary structure prediction using structural alignment.

作者信息

Montgomerie Scott, Sundararaj Shan, Gallin Warren J, Wishart David S

机构信息

Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada.

出版信息

BMC Bioinformatics. 2006 Jun 14;7:301. doi: 10.1186/1471-2105-7-301.

DOI:10.1186/1471-2105-7-301

PMID:16774686

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1550433/

Abstract

BACKGROUND

The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high.

RESULTS

We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4-5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%.

CONCLUSION

By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally.

摘要

背景

在过去30年里，蛋白质二级结构预测的准确性稳步提高。现在，许多二级结构预测方法通常能达到约75%的准确率（Q3）。我们认为，通过将结构（而非序列）数据库比较纳入预测过程，这一准确率有望进一步提高。事实上，鉴于蛋白质数据库规模庞大（超过35000个序列），新鉴定序列具有结构同源物的可能性实际上相当高。

结果

我们开发了一种方法，将基于结构的序列比对作为二级结构预测过程的一部分。通过将已知同源物（序列相似度>25%）的结构映射到查询蛋白质的序列上，可以预测该查询蛋白质二级结构的至少一部分。通过将这种结构比对方法与传统的（基于序列的）二级结构方法相结合，然后与“专家评审团”系统相结合以生成一致结果，能够获得非常高的预测准确率。使用来自EVA的1644个蛋白质的序列唯一测试集，这种新方法的平均Q3得分为81.3%。广泛测试表明，这比目前任何其他方法大约高出4 - 5%。使用非序列唯一测试集（蛋白质组注释或结构基因组学中使用的典型测试集）进行的评估表明，这种新方法可以实现接近88%的Q3得分。

结论

通过同时使用序列和结构数据库，并利用机器学习的最新技术，能够常规地以远高于80%的准确率预测蛋白质二级结构。一个名为PROTEUS的执行这些二级结构预测的程序和网络服务器可在http://wishart.biology.ualberta.ca/proteus上访问。对于高通量或批量序列分析，可以下载PROTEUS程序、数据库（和服务器）并在本地运行。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fdd/1550433/3a4a65608503/1471-2105-7-301-1.jpg

相似文献

Improving the accuracy of protein secondary structure prediction using structural alignment.利用结构比对提高蛋白质二级结构预测的准确性。

BMC Bioinformatics. 2006 Jun 14;7:301. doi: 10.1186/1471-2105-7-301.

PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation.PROTEUS2：一个用于全面蛋白质结构预测和基于结构的注释的网络服务器。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W202-9. doi: 10.1093/nar/gkn255. Epub 2008 May 15.

Improving protein secondary structure prediction using a multi-modal BP method.利用多模态 BP 方法改进蛋白质二级结构预测。

Comput Biol Med. 2011 Oct;41(10):946-59. doi: 10.1016/j.compbiomed.2011.08.005. Epub 2011 Aug 30.

Improving protein secondary structure prediction based on short subsequences with local structure similarity.基于局部结构相似性的短序列提高蛋白质二级结构预测。

BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2164-11-S4-S4.

PCI-SS: MISO dynamic nonlinear protein secondary structure prediction.PCI-SS：MISO动态非线性蛋白质二级结构预测

BMC Bioinformatics. 2009 Jul 17;10:222. doi: 10.1186/1471-2105-10-222.

PROMALS3D web server for accurate multiple protein sequence and structure alignments.用于精确多蛋白序列和结构比对的PROMALS3D网络服务器。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W30-4. doi: 10.1093/nar/gkn322. Epub 2008 May 24.

Integrated web service for improving alignment quality based on segments comparison.基于片段比较的用于提高比对质量的集成网络服务。

BMC Bioinformatics. 2004 Jul 22;5:98. doi: 10.1186/1471-2105-5-98.

Protein secondary structure prediction using local alignments.利用局部比对进行蛋白质二级结构预测。

J Mol Biol. 1997 Apr 25;268(1):31-6. doi: 10.1006/jmbi.1997.0958.

The SSEA server for protein secondary structure alignment.用于蛋白质二级结构比对的SSEA服务器。

Bioinformatics. 2005 Feb 1;21(3):393-5. doi: 10.1093/bioinformatics/bti013. Epub 2004 Sep 3.

Refinement by shifting secondary structure elements improves sequence alignments.通过改变二级结构元件来进行优化可提高序列比对的质量。

Proteins. 2015 Mar;83(3):411-27. doi: 10.1002/prot.24746. Epub 2015 Jan 13.

引用本文的文献

Exploring Novel Inhibitory Compounds Against Phosphatase Gamma 2: A Therapeutic Target for Male Contraceptives.探索针对磷酸酶γ2的新型抑制性化合物：男性避孕药的治疗靶点

Curr Issues Mol Biol. 2025 Aug 15;47(8):658. doi: 10.3390/cimb47080658.

Comparative analysis of adhesion virulence protein FadA from gut-associated bacteria of colorectal cancer patients () and healthy individuals ().结直肠癌患者（）和健康个体（）肠道相关细菌中粘附毒力蛋白FadA的比较分析。

J Cancer. 2024 Aug 19;15(17):5492-5505. doi: 10.7150/jca.98951. eCollection 2024.

In-silico characterization of GABAT protein found in gut-brain axis associated bacteria of healthy individuals and multiple sclerosis patients.在健康个体和多发性硬化症患者的肠脑轴相关细菌中发现的GABAT蛋白的计算机模拟表征。

Saudi J Biol Sci. 2024 Apr;31(4):103939. doi: 10.1016/j.sjbs.2024.103939. Epub 2024 Feb 2.

Development of a novel multi‑epitope vaccine against the pathogenic human polyomavirus V6/7 using reverse vaccinology.基于反向疫苗学技术研发针对致病性人类多瘤病毒 V6/7 的新型多表位疫苗

BMC Infect Dis. 2024 Feb 9;24(1):177. doi: 10.1186/s12879-024-09046-0.

Advances in Computational and Bioinformatics Tools and Databases for Designing and Developing a Multi-Epitope-Based Peptide Vaccine.用于设计和开发基于多表位的肽疫苗的计算和生物信息学工具及数据库的进展

Int J Pept Res Ther. 2023;29(4):60. doi: 10.1007/s10989-023-10535-0. Epub 2023 May 23.

Genome-wide exploration of sugar transporter (sweet) family proteins in Fabaceae for Sustainable protein and carbon source.豆科植物糖转运蛋白（甜）家族蛋白的全基因组探索——可持续的蛋白质和碳源

PLoS One. 2022 May 13;17(5):e0268154. doi: 10.1371/journal.pone.0268154. eCollection 2022.

Designing of a Multi-epitope Vaccine against the Structural Proteins of Marburg Virus Exploiting the Immunoinformatics Approach.利用免疫信息学方法设计针对马尔堡病毒结构蛋白的多表位疫苗。

ACS Omega. 2021 Nov 18;6(47):32043-32071. doi: 10.1021/acsomega.1c04817. eCollection 2021 Nov 30.

A secondary structure-based position-specific scoring matrix applied to the improvement in protein secondary structure prediction.基于二级结构的位置特异性评分矩阵在提高蛋白质二级结构预测中的应用。

PLoS One. 2021 Jul 28;16(7):e0255076. doi: 10.1371/journal.pone.0255076. eCollection 2021.

Secondary Structure of the Novel Myosin Binding Domain WYR and Implications within Myosin Structure.新型肌球蛋白结合结构域WYR的二级结构及其在肌球蛋白结构中的意义

Biology (Basel). 2021 Jun 29;10(7):603. doi: 10.3390/biology10070603.

Immunoinformatics-guided designing and in silico analysis of epitope-based polyvalent vaccines against multiple strains of human coronavirus (HCoV).基于免疫信息学设计和计算机分析的针对多种人冠状病毒（HCoV）株的基于表位的多价疫苗。

Expert Rev Vaccines. 2022 Dec;21(12):1851-1871. doi: 10.1080/14760584.2021.1874925. Epub 2021 Mar 15.

本文引用的文献

Seventh Meeting on the Critical Assessment of Techniques for Protein Structure Prediction.第七届蛋白质结构预测技术关键评估会议

Proteins. 2007;69 Suppl 8:1-2. doi: 10.1002/prot.21849.

Functional bioinformatics for Arabidopsis thaliana.拟南芥的功能生物信息学

Bioinformatics. 2006 May 1;22(9):1130-6. doi: 10.1093/bioinformatics/btl051. Epub 2006 Feb 15.

MIPS: analysis and annotation of proteins from whole genomes in 2005.MIPS：2005年全基因组蛋白质分析与注释

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D169-72. doi: 10.1093/nar/gkj148.

Solution Structure of MTH0776 from Methanobacterium thermoautotrophicum.嗜热自养甲烷杆菌中MTH0776的溶液结构

J Biomol NMR. 2005 Sep;33(1):51-6. doi: 10.1007/s10858-005-1275-5.

BASys: a web server for automated bacterial genome annotation.BASys：一个用于细菌基因组自动注释的网络服务器。

Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W455-9. doi: 10.1093/nar/gki593.

A simple method to adjust inconsistently referenced 13C and 15N chemical shift assignments of proteins.一种调整蛋白质中不一致引用的13C和15N化学位移归属的简单方法。

J Biomol NMR. 2005 Feb;31(2):143-8. doi: 10.1007/s10858-004-7441-3.

Ten years of predictions ... and counting.十年的预测……且仍在继续。

FEBS J. 2005 Feb;272(4):881-2. doi: 10.1111/j.1742-4658.2005.04549.x.

BacMap: an interactive picture atlas of annotated bacterial genomes.BacMap：带注释的细菌基因组交互式图谱集。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D317-20. doi: 10.1093/nar/gki075.

Porter: a new, accurate server for protein secondary structure prediction.波特：一种用于蛋白质二级结构预测的新型精确服务器。

Bioinformatics. 2005 Apr 15;21(8):1719-20. doi: 10.1093/bioinformatics/bti203. Epub 2004 Dec 7.

A simple and fast secondary structure prediction method using hidden neural networks.一种使用隐藏神经网络的简单快速二级结构预测方法。

Bioinformatics. 2005 Jan 15;21(2):152-9. doi: 10.1093/bioinformatics/bth487. Epub 2004 Sep 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用结构比对提高蛋白质二级结构预测的准确性。

Improving the accuracy of protein secondary structure prediction using structural alignment.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献