• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质折叠识别的基线是什么?

What are the baselines for protein fold recognition?

作者信息

McGuffin L J, Bryson K, Jones D T

机构信息

Bioinformatics Group, Department of Biological Sciences, Brunel University, Uxbridge UB8 3PH, UK.

出版信息

Bioinformatics. 2001 Jan;17(1):63-72. doi: 10.1093/bioinformatics/17.1.63.

DOI:10.1093/bioinformatics/17.1.63
PMID:11222263
Abstract

MOTIVATION

What constitutes a baseline level of success for protein fold recognition methods? As fold recognition benchmarks are often presented without any thought to the results that might be expected from a purely random set of predictions, an analysis of fold recognition baselines is long overdue. Given varying amounts of basic information about a protein-ranging from the length of the sequence to a knowledge of its secondary structure-to what extent can the fold be determined by intelligent guesswork? Can simple methods that make use of secondary structure information assign folds more accurately than purely random methods and could these methods be used to construct viable hierarchical classifications? EXPERIMENTS PERFORMED: A number of rapid automatic methods which score similarities between protein domains were devised and tested. These methods ranged from those that incorporated no secondary structure information, such as measuring absolute differences in sequence lengths, to more complex alignments of secondary structure elements. Each method was assessed for accuracy by comparison with the Class Architecture Topology Homology (CATH) classification. Methods were rated against both a random baseline fold assignment method as a lower control and FSSP as an upper control. Similarity trees were constructed in order to evaluate the accuracy of optimum methods at producing a classification of structure.

RESULTS

Using a rigorous comparison of methods with CATH, the random fold assignment method set a lower baseline of 11% true positives allowing for 3% false positives and FSSP set an upper benchmark of 47% true positives at 3% false positives. The optimum secondary structure alignment method used here achieved 27% true positives at 3% false positives. Using a less rigorous Critical Assessment of Structure Prediction (CASP)-like sensitivity measurement the random assignment achieved 6%, FSSP-59% and the optimum secondary structure alignment method-32%. Similarity trees produced by the optimum method illustrate that these methods cannot be used alone to produce a viable protein structural classification system.

CONCLUSIONS

Simple methods that use perfect secondary structure information to assign folds cannot produce an accurate protein taxonomy, however they do provide useful baselines for fold recognition. In terms of a typical CASP assessment our results suggest that approximately 6% of targets with folds in the databases could be assigned correctly by randomly guessing, and as many as 32% could be recognised by trivial secondary structure comparison methods, given knowledge of their correct secondary structures.

摘要

动机

蛋白质折叠识别方法的成功基线水平是由什么构成的?由于折叠识别基准的呈现往往没有考虑到从一组纯粹随机的预测中可能得到的结果,因此对折叠识别基线的分析早就该进行了。考虑到关于蛋白质的基础信息数量各异,从序列长度到其二级结构的知识,那么通过智能猜测在多大程度上可以确定折叠呢?利用二级结构信息的简单方法能否比纯粹随机的方法更准确地分配折叠,并且这些方法能否用于构建可行的层次分类?

所进行的实验

设计并测试了多种对蛋白质结构域之间的相似性进行评分的快速自动方法。这些方法从那些不包含二级结构信息的方法(例如测量序列长度的绝对差异)到更复杂的二级结构元件比对方法不等。通过与类结构拓扑同源性(CATH)分类进行比较来评估每种方法的准确性。将方法与作为下限对照的随机基线折叠分配方法以及作为上限对照的FSSP进行评分比较。构建相似性树以评估最优方法在生成结构分类方面的准确性。

结果

通过将方法与CATH进行严格比较,随机折叠分配方法设定了较低的基线,即真阳性率为11%,允许假阳性率为3%,而FSSP设定了较高的基准,即真阳性率为47%,假阳性率为3%。这里使用的最优二级结构比对方法在假阳性率为3%时实现了27%的真阳性率。使用不太严格的类似蛋白质结构预测关键评估(CASP)的敏感性测量方法,随机分配方法的真阳性率为6%,FSSP为59%,最优二级结构比对方法为32%。最优方法生成的相似性树表明,这些方法不能单独用于生成可行的蛋白质结构分类系统。

结论

使用完美二级结构信息来分配折叠的简单方法无法产生准确的蛋白质分类法,然而它们确实为折叠识别提供了有用的基线。就典型的CASP评估而言,我们的结果表明,在数据库中具有折叠的目标中,大约6%可以通过随机猜测正确分配,并且在知道其正确二级结构的情况下,多达32%可以通过简单的二级结构比较方法识别出来。

相似文献

1
What are the baselines for protein fold recognition?蛋白质折叠识别的基线是什么?
Bioinformatics. 2001 Jan;17(1):63-72. doi: 10.1093/bioinformatics/17.1.63.
2
A systematic comparison of protein structure classifications: SCOP, CATH and FSSP.蛋白质结构分类的系统比较:SCOP、CATH和FSSP。
Structure. 1999 Sep 15;7(9):1099-112. doi: 10.1016/s0969-2126(99)80177-4.
3
Automated assignment of SCOP and CATH protein structure classifications from FSSP scores.基于FSSP评分对SCOP和CATH蛋白质结构分类进行自动分配。
Proteins. 2002 Mar 1;46(4):405-15. doi: 10.1002/prot.1176.
4
AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.AutoSCOP:使用独特的模式-类别映射自动预测SCOP分类
Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.
5
Improvement of the GenTHREADER method for genomic fold recognition.用于基因组折叠识别的GenTHREADER方法的改进。
Bioinformatics. 2003 May 1;19(7):874-81. doi: 10.1093/bioinformatics/btg097.
6
Rapid protein domain assignment from amino acid sequence using predicted secondary structure.利用预测的二级结构从氨基酸序列中快速进行蛋白质结构域分配。
Protein Sci. 2002 Dec;11(12):2814-24. doi: 10.1110/ps.0209902.
7
The FSSP database: fold classification based on structure-structure alignment of proteins.FSSP数据库:基于蛋白质结构-结构比对的折叠分类
Nucleic Acids Res. 1996 Jan 1;24(1):206-9. doi: 10.1093/nar/24.1.206.
8
Structure-based evaluation of sequence comparison and fold recognition alignment accuracy.基于结构的序列比对和折叠识别比对准确性评估。
J Mol Biol. 2000 Apr 7;297(4):1003-13. doi: 10.1006/jmbi.2000.3615.
9
Fold prediction by a hierarchy of sequence, threading, and modeling methods.通过序列、穿线法和建模方法的层次结构进行折叠预测。
Protein Sci. 1998 Jun;7(6):1431-40. doi: 10.1002/pro.5560070620.
10
Towards an automatic classification of protein structural domains based on structural similarity.基于结构相似性的蛋白质结构域自动分类研究
BMC Bioinformatics. 2008 Jan 31;9:74. doi: 10.1186/1471-2105-9-74.

引用本文的文献

1
PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences.PSS-3D1D:一种用于注释模糊区域序列的改进型蛋白质折叠识别3D1D轮廓方法。
J Struct Funct Genomics. 2011 Dec;12(4):181-9. doi: 10.1007/s10969-011-9119-x. Epub 2011 Dec 3.
2
Alignment-free local structural search by writhe decomposition.无比对的局部结构搜索通过纽结分解。
Bioinformatics. 2010 May 1;26(9):1176-84. doi: 10.1093/bioinformatics/btq127. Epub 2010 Apr 5.
3
The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space.
重新审视 CATH 层次结构——结构域超家族中的差异以及折叠空间的连续性。
Structure. 2009 Aug 12;17(8):1051-62. doi: 10.1016/j.str.2009.06.015.
4
Benchmarking consensus model quality assessment for protein fold recognition.蛋白质折叠识别的基准共识模型质量评估
BMC Bioinformatics. 2007 Sep 18;8:345. doi: 10.1186/1471-2105-8-345.
5
Fold classification based on secondary structure--how much is gained by including loop topology?基于二级结构的折叠分类——纳入环拓扑结构能带来多少收获?
BMC Struct Biol. 2006 Mar 8;6:3. doi: 10.1186/1472-6807-6-3.
6
A consensus view of fold space: combining SCOP, CATH, and the Dali Domain Dictionary.折叠空间的共识观点:结合SCOP、CATH和达利结构域词典
Protein Sci. 2003 Oct;12(10):2150-60. doi: 10.1110/ps.0306803.
7
Rapid protein domain assignment from amino acid sequence using predicted secondary structure.利用预测的二级结构从氨基酸序列中快速进行蛋白质结构域分配。
Protein Sci. 2002 Dec;11(12):2814-24. doi: 10.1110/ps.0209902.