Richards F M, Kundrot C E
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06511.
Proteins. 1988;3(2):71-84. doi: 10.1002/prot.340030202.
A computer program is described that produces a description of the secondary structure and supersecondary structure of a polypeptide chain using the list of alpha carbon coordinates as input. Restricting the term "secondary structure" to the conformation of contiguous segments of the chain, the program determines the initial and final residues in helices, extended strands, sharp turns, and omega loops. This is accomplished through the use of difference distance matrices. The distances in idealized models of the segments are compared with the actual structure, and the differences are evaluated for agreement within preset limits. The program assigns 90-95% of the residues in most proteins to at least one type of secondary element. In a second step the now-defined helices and strands are idealized as straight line segments, and the axial directions and locations are compiled from the input C alpha coordinate list. These data are used to check for moderate curvature in strands and helices, and the secondary structure list is corrected where necessary. The geometric relations between these line segments are then calculated and output as the first level of supersecondary structure. A maximum of six parameters are required for a complete description of the relations between each pair. Frequently a less complete description will suffice, for example just the interaxial separation and angle. Both the secondary structure and one aspect of the supersecondary structure can be displayed in a character matrix analogous to the distance matrix format. This allows a quite accurate two-dimensional display of the three-dimensional structure, and several examples are presented. A procedure for searching for arbitrary substructures in proteins using distance matrices is also described. A search for the DNA binding helix-turn-helix motif in the Protein Data Bank serves as an example. A further abstraction of the above data can be made in the form of a metamatrix where each diagonal element represents an entire secondary segment rather than a single atom, and the off-diagonal elements contain all the parameters describing their interrelations. Such matrices can be used in a straightforward search for higher levels of supersecondary structure or used in toto as a representation of the entire tertiary structure of the polypeptide chain.
本文描述了一个计算机程序,该程序以α碳原子坐标列表作为输入,生成多肽链二级结构和超二级结构的描述。将“二级结构”一词限定为链的连续片段的构象,该程序确定螺旋、伸展链、急转弯和ω环中的起始和终止残基。这是通过使用差异距离矩阵来完成的。将片段理想化模型中的距离与实际结构进行比较,并评估差异是否在预设范围内一致。该程序将大多数蛋白质中90 - 95%的残基分配到至少一种二级结构元件类型。在第二步中,现在定义的螺旋和链被理想化为直线段,并从输入的Cα坐标列表中编译轴向方向和位置。这些数据用于检查链和螺旋中的适度曲率,并在必要时校正二级结构列表。然后计算这些线段之间的几何关系,并作为超二级结构的第一级输出。完整描述每对线段之间的关系最多需要六个参数。通常,不太完整的描述就足够了,例如仅轴间距和角度。二级结构和超二级结构的一个方面都可以显示在类似于距离矩阵格式的字符矩阵中。这允许对三维结构进行相当准确的二维显示,并给出了几个示例。还描述了一种使用距离矩阵在蛋白质中搜索任意子结构的方法。以在蛋白质数据库中搜索DNA结合螺旋-转角-螺旋基序为例。上述数据的进一步抽象可以采用元矩阵的形式,其中每个对角元素代表整个二级片段而不是单个原子,非对角元素包含描述它们相互关系的所有参数。这样的矩阵可以直接用于搜索更高层次的超二级结构,或者作为多肽链整个三级结构的表示整体使用。