Sowdhamini R, Rufino S D, Blundell T L
Department of Crystallography, Birkbeck College, London, UK.
Fold Des. 1996;1(3):209-20. doi: 10.1016/S1359-0278(96)00032-6.
A database of globular domains, derived from a non-redundant set of proteins, is useful for the sequence analysis of aligned domains, for structural comparisons, for understanding domain stability and flexibility and for fold recognition procedures. Domains are defined by the program DIAL and classified structurally using the procedure SEA.
The DIAL-derived domain database (DDBASE) consists of 436 protein chains involving 695 protein domains. Of these, 206 are alpha-class, 191 are beta-class and 294 alpha and beta class. The domains, 63% from multidomain proteins and 73% less than 150 residues in length, were clustered automatically using both single-link cluster analysis and hierarchical clustering to give a quantitative estimate of similarity in the domain-fold space.
Highly populated and well described folds (doubly wound alpha/beta, singly wound alpha/beta barrels, globins alpha, large Greek-key beta and flavin-binding alpha/beta) are recognized at a SEA cut-off score of 0.55 in single-link clustering and at 0.65 in hierarchical clustering, although functionally related families are usually clearly distinguished at more stringent values.
一个源自非冗余蛋白质集的球状结构域数据库,对于比对结构域的序列分析、结构比较、理解结构域稳定性和灵活性以及折叠识别程序都很有用。结构域由程序DIAL定义,并使用SEA程序进行结构分类。
DIAL衍生的结构域数据库(DDBASE)由436条蛋白质链组成,涉及695个蛋白质结构域。其中,206个是α类,191个是β类,294个是α和β类。这些结构域,63%来自多结构域蛋白质,73%长度小于150个残基,使用单链聚类分析和层次聚类自动聚类,以定量估计结构域折叠空间中的相似性。
在单链聚类中,SEA截止分数为0.55,在层次聚类中为0.65时,可识别出高度密集且描述良好的折叠(双绕α/β、单绕α/β桶、α类球蛋白、大希腊钥匙β和黄素结合α/β),尽管功能相关家族通常在更严格的值下能更清晰地区分。