Attwood T K, Beck M E
Department of Biochemistry, University of Leeds, UK.
Protein Eng. 1994 Jul;7(7):841-8. doi: 10.1093/protein/7.7.841.
The PRINTS database of protein 'fingerprints' is described. Fingerprints comprise sets of motifs excised from conserved regions of sequence alignments, their diagnostic power or potency being refined by iterative database scanning (in this case the OWL composite sequence database). Generally, the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3-D space. The use of groups of independent, linearly or spatially separate motifs allows particular protein folds and functionalities to be characterized more flexibly and powerfully than conventional single-component patterns or regular expressions. The current version of the database (4.0) contains 150 entries (encoding > 700 motifs), covering a wide range of globular and membrane proteins, modular polypeptides and so on. The growth of the database is influenced by a number of factors, e.g. the use of multiple motifs, the maximization of sequence information through iterative database scanning and the fact that the database searched is a large composite. The information contained within PRINTS is distinct from but complementary to the single consensus expressions stored in the widely used PROSITE dictionary of patterns.
本文描述了蛋白质“指纹”的PRINTS数据库。指纹由从序列比对保守区域中提取的基序集组成,通过迭代数据库扫描(在此为OWL复合序列数据库)来提高其诊断能力或效力。通常,这些基序不重叠,而是沿序列分开,尽管它们在三维空间中可能相邻。使用独立的、线性或空间上分开的基序组,比传统的单一组分模式或正则表达式更灵活、更有力地表征特定的蛋白质折叠和功能。数据库的当前版本(4.0)包含150个条目(编码超过700个基序),涵盖了广泛的球状和膜蛋白、模块化多肽等。数据库的增长受到多种因素的影响,例如使用多个基序、通过迭代数据库扫描使序列信息最大化,以及所搜索的数据库是一个大型复合体这一事实。PRINTS中包含的信息与广泛使用的PROSITE模式字典中存储的单一共有表达式不同,但互为补充。