Gerstein M, Altman R B
Section on Medical Informatics, Stanford University, CA 94305, USA.
J Mol Biol. 1995 Aug 4;251(1):161-75. doi: 10.1006/jmbi.1995.0423.
A variety of methods are currently available for creating multiple alignments, and these can be used to define and characterize families of related proteins, such as the globins or the immunoglobulins. We have developed a method for using a multiple alignment to identify an average structural "core", a subset of atoms with low structural variation. We show how the means and variances of core-atom positions summarize the commonalities and differences with a family, making them particularly useful in compiling libraries of protein folds. We show further how it is possible to describe the rotation and translation relating two core structures, as in two domains of a multi-domain protein, in a consistent fashion in terms of a "mean" transformation and a deviation about this mean. Once determined, our average core structures (with their implicit measure of structural variation) allow us to define a measure of structural similarity more informative than the usual root-mean-square (RMS) deviation in atomic position, i.e. a "better RMS." Our average structures also permit straightforward comparisons between variation in structure and sequence at each position in a family. We have applied our core-finding methodology in detail to the immunoglobulin family. We find that the structural variability we observe just within the VL and VH domains anticipates the variability that others have observed throughout the whole immunoglobulin superfamily; that a core definition based on sequence conservation, somewhat surprisingly, does not agree with one based on structural similarity; and that the cores of the VL and VH domains vary about 5 degrees in relative orientation across the known structures.
目前有多种方法可用于创建多重比对,这些方法可用于定义和表征相关蛋白质家族,如球蛋白或免疫球蛋白。我们开发了一种利用多重比对来识别平均结构“核心”的方法,即结构变化较小的原子子集。我们展示了核心原子位置的均值和方差如何总结一个家族的共性和差异,使其在构建蛋白质折叠文库时特别有用。我们进一步展示了如何以一致的方式,根据“平均”变换及其偏差来描述多结构域蛋白质两个结构域中两个核心结构之间的旋转和平移关系。一旦确定,我们的平均核心结构(及其隐含的结构变化度量)使我们能够定义一种比通常原子位置的均方根(RMS)偏差更具信息量的结构相似性度量,即“更好的RMS”。我们的平均结构还允许直接比较一个家族中每个位置的结构和序列变化。我们已将核心发现方法详细应用于免疫球蛋白家族。我们发现,我们仅在VL和VH结构域内观察到的结构变异性预示着其他人在整个免疫球蛋白超家族中观察到的变异性;基于序列保守性的核心定义与基于结构相似性的核心定义出人意料地不一致;并且在已知结构中,VL和VH结构域的核心相对取向变化约5度。