Suppr超能文献

SSpro/ACCpro 5:利用序列谱、机器学习和结构相似性对蛋白质二级结构和相对溶剂可及性进行近乎完美的预测。

SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity.

作者信息

Magnan Christophe N, Baldi Pierre

机构信息

Department of Computer Science and Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA Department of Computer Science and Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA.

出版信息

Bioinformatics. 2014 Sep 15;30(18):2592-7. doi: 10.1093/bioinformatics/btu352. Epub 2014 May 24.

Abstract

MOTIVATION

Accurately predicting protein secondary structure and relative solvent accessibility is important for the study of protein evolution, structure and function and as a component of protein 3D structure prediction pipelines. Most predictors use a combination of machine learning and profiles, and thus must be retrained and assessed periodically as the number of available protein sequences and structures continues to grow.

RESULTS

We present newly trained modular versions of the SSpro and ACCpro predictors of secondary structure and relative solvent accessibility together with their multi-class variants SSpro8 and ACCpro20. We introduce a sharp distinction between the use of sequence similarity alone, typically in the form of sequence profiles at the input level, and the additional use of sequence-based structural similarity, which uses similarity to sequences in the Protein Data Bank to infer annotations at the output level, and study their relative contributions to modern predictors. Using sequence similarity alone, SSpro's accuracy is between 79 and 80% (79% for ACCpro) and no other predictor seems to exceed 82%. However, when sequence-based structural similarity is added, the accuracy of SSpro rises to 92.9% (90% for ACCpro). Thus, by combining both approaches, these problems appear now to be essentially solved, as an accuracy of 100% cannot be expected for several well-known reasons. These results point also to several open technical challenges, including (i) achieving on the order of ≥ 80% accuracy, without using any similarity with known proteins and (ii) achieving on the order of ≥ 85% accuracy, using sequence similarity alone.

AVAILABILITY AND IMPLEMENTATION

SSpro, SSpro8, ACCpro and ACCpro20 programs, data and web servers are available through the SCRATCH suite of protein structure predictors at http://scratch.proteomics.ics.uci.edu.

摘要

动机

准确预测蛋白质二级结构和相对溶剂可及性对于蛋白质进化、结构和功能的研究以及作为蛋白质三维结构预测流程的一个组成部分而言至关重要。大多数预测器使用机器学习和轮廓的组合,因此随着可用蛋白质序列和结构数量的持续增长,必须定期重新训练和评估。

结果

我们展示了二级结构和相对溶剂可及性预测器SSpro和ACCpro的新训练模块化版本及其多类变体SSpro8和ACCpro20。我们明确区分了仅使用序列相似性(通常以输入级别的序列轮廓形式)和额外使用基于序列的结构相似性(利用与蛋白质数据库中序列的相似性在输出级别推断注释),并研究它们对现代预测器的相对贡献。仅使用序列相似性时,SSpro的准确率在79%至80%之间(ACCpro为79%),似乎没有其他预测器超过82%。然而,当添加基于序列的结构相似性时,SSpro的准确率提高到92.9%(ACCpro为90%)。因此,通过结合这两种方法,由于一些众所周知的原因无法期望达到100%的准确率,这些问题现在似乎已基本得到解决。这些结果还指出了几个开放的技术挑战,包括(i)在不使用与已知蛋白质的任何相似性的情况下达到≥80%的准确率水平,以及(ii)仅使用序列相似性达到≥85%的准确率水平。

可用性和实现方式

SSpro、SSpro8、ACCpro和ACCpro20程序、数据和网络服务器可通过蛋白质结构预测器的SCRATCH套件在http://scratch.proteomics.ics.uci.edu获得。

相似文献

7
Deep architectures for protein contact map prediction.用于蛋白质接触图预测的深度架构。
Bioinformatics. 2012 Oct 1;28(19):2449-57. doi: 10.1093/bioinformatics/bts475. Epub 2012 Jul 30.

引用本文的文献

本文引用的文献

1
The Dropout Learning Algorithm.辍学学习算法
Artif Intell. 2014 May;210:78-122. doi: 10.1016/j.artint.2014.02.004.
3
Scalable web services for the PSIPRED Protein Analysis Workbench.可扩展的 Web 服务,用于 PSIPRED 蛋白质分析工作平台。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W349-57. doi: 10.1093/nar/gkt381. Epub 2013 Jun 8.
5
UniRef: comprehensive and non-redundant UniProt reference clusters.UniRef:全面且无冗余的UniProt参考簇。
Bioinformatics. 2007 May 15;23(10):1282-8. doi: 10.1093/bioinformatics/btm098. Epub 2007 Mar 22.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验