Wu T D, Schmidler S C, Hastie T, Brutlag D L
Department of Biochemistry, Stanford University School of Medicine, California 94305, USA.
J Comput Biol. 1998 Fall;5(3):585-95. doi: 10.1089/cmb.1998.5.585.
A general framework is presented for analyzing multiple protein structures using statistical regression methods. The regression approach can superimpose protein structures rigidly or with shear. Also, this approach can superimpose multiple structures explicitly, without resorting to pairwise superpositions. The algorithm alternates between matching corresponding landmarks among the protein structures and superimposing these landmarks. Matching is performed using a robust dynamic programming technique that uses gap penalties that adapt to the given data. Superposition is performed using either orthogonal transformations, which impose the rigid-body assumption, or affine transformations, which allow shear. The resulting regression model of a protein family measures the amount of structural variability at each landmark. A variation of our algorithm permits a separate weight for each landmark, thereby allowing one to emphasize particular segments of a protein structure or to compensate for variances that differ at various positions in a structure. In addition, a method is introduced for finding an initial correspondence, by measuring the discrete curvature along each protein backbone. Discrete curvature also characterizes the secondary structure of a protein backbone, distinguishing among helical, strand, and loop regions. An example is presented involving a set of seven globin structures. Regression analysis, using both affine and orthogonal transformations, reveals that globins are most strongly conserved structurally in helical regions, particularly in the mid-regions of the E, F, and G helices.
本文提出了一个使用统计回归方法分析多个蛋白质结构的通用框架。该回归方法可以对蛋白质结构进行刚体叠加或带有剪切的叠加。此外,这种方法可以直接叠加多个结构,而无需进行成对叠加。该算法在匹配蛋白质结构中相应的地标点和叠加这些地标点之间交替进行。匹配使用一种强大的动态规划技术,该技术使用适应给定数据的空位罚分。叠加使用正交变换(施加刚体假设)或仿射变换(允许剪切)来执行。所得的蛋白质家族回归模型测量每个地标点处的结构变异性。我们算法的一个变体允许为每个地标点赋予单独的权重,从而使人们能够强调蛋白质结构的特定片段或补偿结构中不同位置处不同的方差。此外,还引入了一种通过测量每个蛋白质主链上的离散曲率来找到初始对应关系的方法。离散曲率还表征了蛋白质主链的二级结构,区分螺旋、链和环区域。给出了一个涉及一组七个球蛋白结构的示例。使用仿射变换和正交变换的回归分析表明,球蛋白在螺旋区域,特别是在E、F和G螺旋的中部区域,在结构上最为保守。