Holm L, Sander C
European Molecular Biology Laboratory, European Bioinformatics Institute, Genome Campus, Cambridge CB10 1SD, UK.
Nucleic Acids Res. 1998 Jan 1;26(1):316-9. doi: 10.1093/nar/26.1.316.
The FSSP database and its new supplement, the Dali Domain Dictionary, present a continuously updated classification of all known 3D protein structures. The classification is derived using an automatic structure alignment program (Dali) for the all-against-all comparison of structures in the Protein Data Bank. From the resulting enumeration of structural neighbours (which form a surprisingly continuous distribution in fold space) we derive a discrete fold classification in three steps: (i) sequence-related families are covered by a representative set of protein chains; (ii) protein chains are decomposed into structural domains based on the recurrence of structural motifs; (iii) folds are defined as tight clusters of domains in fold space. The fold classification, domain definitions and test sets for sequence-structure alignment (threading) are accessible on the web at www.embl-ebi.ac.uk/dali . The web interface provides a rich network of links between neighbours in fold space, between domains and proteins, and between structures and sequences leading, for example, to a database of explicit multiple alignments of protein families in the twilight zone of sequence similarity. The Dali/FSSP organization of protein structures provides a map of the currently known regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination.
FSSP数据库及其新补充内容——大理结构域词典,提供了所有已知3D蛋白质结构的持续更新分类。该分类是通过一个自动结构比对程序(大理)对蛋白质数据库中的结构进行全对全比较得出的。从所得的结构邻居枚举(在折叠空间中形成令人惊讶的连续分布)出发,我们通过三个步骤得出离散的折叠分类:(i)与序列相关的家族由一组代表性蛋白质链覆盖;(ii)根据结构基序的重复将蛋白质链分解为结构域;(iii)折叠被定义为折叠空间中紧密的结构域簇。折叠分类、结构域定义以及用于序列-结构比对(穿线法)的测试集可在网站www.embl-ebi.ac.uk/dali上获取。该网络界面在折叠空间中的邻居、结构域与蛋白质之间以及结构与序列之间提供了丰富的链接网络,例如可通向一个在序列相似性临界区域的蛋白质家族明确多重比对数据库。蛋白质结构的大理/FSSP组织提供了一张蛋白质宇宙当前已知区域的图谱,这对于折叠原理分析、蛋白质家族的进化统一以及最大化实验结构测定的信息回报都很有用。