Yee D P, Dill K A
Department of Pharmaceutical Chemistry, University of California, San Francisco 94143-1204.
Protein Sci. 1993 Jun;2(6):884-99. doi: 10.1002/pro.5560020603.
Protein structures come in families. Are families "closely knit" or "loosely knit" entities? We describe a measure of relatedness among polymer conformations. Based on weighted distance maps, this measure differs from existing measures mainly in two respects: (1) it is computationally fast, and (2) it can compare any two proteins, regardless of their relative chain lengths or degree of similarity. It does not require finding relative alignments. The measure is used here to determine the dissimilarities between all 12,403 possible pairs of 158 diverse protein structures from the Brookhaven Protein Data Bank (PDB). Combined with minimal spanning trees and hierarchical clustering methods, this measure is used to define structural families. It is also useful for rapidly searching a dataset of protein structures for specific substructural motifs. By using an analogy to distributions of Euclidean distances, we find that protein families are not tightly knit entities.
蛋白质结构可分为不同家族。这些家族是“紧密相连”还是“松散相连”的实体呢?我们描述了一种聚合物构象之间相关性的度量方法。基于加权距离图,这种度量方法与现有方法主要在两个方面存在差异:(1)计算速度快,(2)可以比较任意两个蛋白质,无论它们的相对链长或相似程度如何。它不需要寻找相对比对。这里使用该度量方法来确定布鲁克海文蛋白质数据库(PDB)中158种不同蛋白质结构的所有12403种可能配对之间的差异。结合最小生成树和层次聚类方法,该度量方法用于定义结构家族。它对于在蛋白质结构数据集中快速搜索特定的子结构基序也很有用。通过类比欧几里得距离的分布,我们发现蛋白质家族并非紧密相连的实体。