Hadley C, Jones D T
Protein Structure Group Department of Biological Sciences University of Warwick Coventry, CV4 7AL, UK.
Structure. 1999 Sep 15;7(9):1099-112. doi: 10.1016/s0969-2126(99)80177-4.
Several methods of structural classification have been developed to introduce some order to the large amount of data present in the Protein Data Bank. Such methods facilitate structural comparisons and provide a greater understanding of structure and function. The most widely used and comprehensive databases are SCOP, CATH and FSSP, which represent three unique methods of classifying protein structures: purely manual, a combination of manual and automated, and purely automated, respectively. In order to develop reliable template libraries and benchmarks for protein-fold recognition, a systematic comparison of these databases has been carried out to determine their overall agreement in classifying protein structures.
Approximately two-thirds of the protein chains in each database are common to all three databases. Despite employing different methods, and basing their systems on different rules of protein structure and taxonomy, SCOP, CATH and FSSP agree on the majority of their classifications. Discrepancies and inconsistencies are accounted for by a small number of explanations. Other interesting features have been identified, and various differences between manual and automatic classification methods are presented.
Using these databases requires an understanding of the rules upon which they are based; each method offers certain advantages depending on the biological requirements and knowledge of the user. The degree of discrepancy between the systems also has an impact on reliability of prediction methods that employ these schemes as benchmarks. To generate accurate fold templates for threading, we extract information from a consensus database, encompassing agreements between SCOP, CATH and FSSP.
已经开发了几种结构分类方法,以便对蛋白质数据库中存在的大量数据进行一定程度的整理。这些方法有助于进行结构比较,并能更深入地理解结构与功能。使用最广泛且最全面的数据库是SCOP、CATH和FSSP,它们分别代表了三种独特的蛋白质结构分类方法:纯手动、手动与自动相结合以及纯自动分类。为了开发可靠的蛋白质折叠识别模板库和基准,对这些数据库进行了系统比较,以确定它们在蛋白质结构分类方面的总体一致性。
每个数据库中约三分之二的蛋白质链在所有三个数据库中都是相同的。尽管采用了不同的方法,且其系统基于不同的蛋白质结构和分类规则,但SCOP、CATH和FSSP在大多数分类上是一致的。少数解释说明了差异和不一致之处。还发现了其他有趣的特征,并呈现了手动和自动分类方法之间的各种差异。
使用这些数据库需要了解其基于的规则;每种方法根据用户的生物学需求和知识都有一定优势。系统之间的差异程度也会影响将这些方案用作基准的预测方法的可靠性。为了生成用于穿线法的准确折叠模板,我们从一个共识数据库中提取信息,该数据库涵盖了SCOP、CATH和FSSP之间的一致性。