Pearl F M, Lee D, Bray J E, Sillitoe I, Todd A E, Harrison A P, Thornton J M, Orengo C A
Department of Biochemistry, University College London, University of London, Gower Street, London WC1E 6BT, UK.
Nucleic Acids Res. 2000 Jan 1;28(1):277-82. doi: 10.1093/nar/28.1.277.
We report the latest release (version 1.6) of the CATH protein domains database (http://www.biochem.ucl. ac.uk/bsm/cath ). This is a hierarchical classification of 18 577 domains into evolutionary families and structural groupings. We have identified 1028 homo-logous superfamilies in which the proteins have both structural, and sequence or functional similarity. These can be further clustered into 672 fold groups and 35 distinct architectures. Recent developments of the database include the generation of 3D templates for recognising structural relatives in each fold group, which has led to significant improvements in the speed and accuracy of updating the database and also means that less manual validation is required. We also report the establishment of the CATH-PFDB (Protein Family Database), which associates 1D sequences with the 3D homologous superfamilies. Sequences showing identifiable homology to entries in CATH have been extracted from GenBank using PSI-BLAST. A CATH-PSIBLAST server has been established, which allows you to scan a new sequence against the database. The CATH Dictionary of Homologous Superfamilies (DHS), which contains validated multiple structural alignments annotated with consensus functional information for evolutionary protein superfamilies, has been updated to include annotations associated with sequence relatives identified in GenBank. The DHS is a powerful tool for considering the variation of functional properties within a given CATH superfamily and in deciding what functional properties may be reliably inherited by a newly identified relative.
我们公布了CATH蛋白质结构域数据库(http://www.biochem.ucl.ac.uk/bsm/cath )的最新版本(1.6版)。该数据库将18577个结构域进行了层次分类,划分成进化家族和结构分组。我们识别出了1028个同源超家族,其中的蛋白质在结构、序列或功能上具有相似性。这些超家族可进一步聚类为672个折叠组和35种不同的结构。该数据库的近期进展包括为每个折叠组生成用于识别结构相关物的三维模板,这使得数据库更新的速度和准确性有了显著提高,同时也意味着所需的人工验证减少。我们还报告了CATH-PFDB(蛋白质家族数据库)的建立,它将一维序列与三维同源超家族相关联。使用PSI-BLAST从GenBank中提取了与CATH条目不具有可识别同源性的序列。已建立了一个CATH-PSIBLAST服务器,可让您针对该数据库扫描新序列。同源超家族CATH词典(DHS)包含经过验证的多结构比对,并标注了进化蛋白质超家族的一致功能信息,现已更新,纳入了与在GenBank中识别出的序列相关物相关的注释。DHS是一个强大的工具,可用于考虑给定CATH超家族内功能特性的变化,以及确定新识别出的相关物可能可靠继承的功能特性。