Gerstein M, Altman R B
Department of Structural Biology, Fairchild D109, Stanford University, CA 94305, USA.
Comput Appl Biosci. 1995 Dec;11(6):633-44. doi: 10.1093/bioinformatics/11.6.633.
As the database of three-dimensional protein structures expands, it becomes possible to classify related structures into families. Some of these families, such as the globins, have enough members to allow statistical analysis of conserved features. Previously, we have shown that a probabilistic representation based on means and variances can be useful for defining structural cores for large families. These cores contain the subset of atoms that are in essentially the same relative positions in all members of the family. In addition to defining a core, our method creates an ordered list of atoms, ranked by their structural variation. In applying our core-finding procedure to the globins, we find that helices A, B, G and H form a structural core with low variance. These helices fold early in the folding pathway, and superimpose well with helices in the helix-turn-helix repressor protein family. The non-core helices (F and the parts of other helices that interact with it) are associated with the functional differences among the globins, and are encoded within a separate exon. We have also compared the variability measure implicit in our core structures with measures of sequence variability, using a procedure for measuring sequence variability that helps correct for the biased sampling in the databanks. We find, somewhat surprisingly, that sequence variation does not appear to correlate with structural variation.
随着三维蛋白质结构数据库的不断扩展,将相关结构分类为家族成为可能。其中一些家族,如珠蛋白家族,拥有足够数量的成员,从而能够对保守特征进行统计分析。此前,我们已经表明,基于均值和方差的概率表示法对于定义大型家族的结构核心可能是有用的。这些核心包含在家族所有成员中相对位置基本相同的原子子集。除了定义核心之外,我们的方法还创建了一个按结构变化排序的原子有序列表。在将我们的核心发现程序应用于珠蛋白时,我们发现螺旋A、B、G和H形成了一个方差较低的结构核心。这些螺旋在折叠途径中较早折叠,并且与螺旋-转角-螺旋阻遏蛋白家族中的螺旋能很好地重叠。非核心螺旋(F以及与之相互作用的其他螺旋部分)与珠蛋白之间的功能差异相关,并由一个单独的外显子编码。我们还使用一种有助于校正数据库中偏差抽样的序列变异性测量程序,将我们核心结构中隐含的变异性测量与序列变异性测量进行了比较。我们有些惊讶地发现,序列变异似乎与结构变异不相关。