Stebbings Lucy A, Mizuguchi Kenji
Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D203-7. doi: 10.1093/nar/gkh027.
HOMSTRAD (http://www-cryst.bioc.cam.ac.uk/ homstrad/) is a collection of protein families, clustered on the basis of sequence and structural similarity. The database is unique in that the protein family sequence alignments have been specially annotated using the program, JOY, to highlight a wide range of structural features. Such data are useful for identifying key structurally conserved residues within the families. Superpositions of the structures within each family are also available and a sensitive structure-aided search engine, FUGUE, can be used to search the database for matches to a query protein sequence. Historically, HOMSTRAD families were generated using several key pieces of software, including COMPARER and MNYFIT, and held in a number of flat files and indexes. A new relational database version of HOMSTRAD, HOMSTRAD BETA (http://www-cryst.bioc.cam. ac.uk/homstradbeta/) is being developed using MySQL. This relational data structure provides more flexibility for future developments, reduces update times and makes data more easily accessible. Consequently it has been possible to add a number of new web features including a custom alignment facility. Altogether, this makes HOMSTRAD and its new BETA version, an excellent resource both for comparative modelling and for identifying distant sequence/structure similarities between proteins.
HOMSTRAD(http://www-cryst.bioc.cam.ac.uk/homstrad/)是一个蛋白质家族集合,这些家族是根据序列和结构相似性聚类而成的。该数据库的独特之处在于,蛋白质家族序列比对已使用JOY程序进行了专门注释,以突出各种结构特征。此类数据对于识别家族内关键的结构保守残基很有用。每个家族内结构的叠加也可获取,并且可以使用一个灵敏的结构辅助搜索引擎FUGUE在数据库中搜索与查询蛋白质序列匹配的序列。从历史上看,HOMSTRAD家族是使用包括COMPARER和MNYFIT在内的几个关键软件生成的,并保存在多个平面文件和索引中。正在使用MySQL开发HOMSTRAD的一个新的关系数据库版本,即HOMSTRAD BETA(http://www-cryst.bioc.cam.ac.uk/homstradbeta/)。这种关系数据结构为未来的开发提供了更大的灵活性,减少了更新时间,并使数据更容易获取。因此,已经可以添加许多新的网络功能,包括自定义比对工具。总之,这使得HOMSTRAD及其新的BETA版本成为比较建模以及识别蛋白质之间远距离序列/结构相似性的优秀资源。