Taylor W R
Division of Mathematical Biology, National Institute for Medical Research, Mill Hill, London, UK.
J Mol Biol. 2001 Jul 27;310(5):1135-50. doi: 10.1006/jmbi.2001.4817.
The analysis of protein structure using secondary structure line segments has been widely used in many structure analysis and prediction methods over the past 20 years. Its use in methods that compare protein structures at this level of representation is becoming more important as an increasing number of protein structures become determined through structural genomic programmes. The standard method used to define line segments is to fit an axis through each secondary structure element. This approach has difficulties, however, both with inconsistent definitions of secondary structure and the problem of fitting a single straight line to a bent structure. The procedure described here avoids these problems by finding a set of line segments independently of any external secondary structure definition. This allows the segments to be used as a novel basis for secondary structure definition by taking the average rise/residue along each axis to characterise the segment. This practice has the advantage that secondary structures are described by a single (continuous) value that is not restricted to the conventional classes of alpha-helix, 310 and beta-strand. This latter property allows structures without "classic" secondary structures to be encoded as line segments that can be used in comparison algorithms. When compared over a large number of pairs of homologous proteins, the current method was found to be slightly more consistent than a widely used method based on hydrogen bonds.
在过去20年中,利用二级结构线段进行蛋白质结构分析已广泛应用于许多结构分析和预测方法中。随着通过结构基因组计划确定的蛋白质结构数量不断增加,在这种表示水平上比较蛋白质结构的方法中使用它变得越来越重要。用于定义线段的标准方法是通过每个二级结构元件拟合一条轴。然而,这种方法存在困难,二级结构定义不一致,以及将一条直线拟合到弯曲结构的问题。这里描述的程序通过独立于任何外部二级结构定义找到一组线段来避免这些问题。这使得这些线段可以通过沿每个轴的平均上升/残基来表征该线段,从而用作二级结构定义的新基础。这种做法的优点是二级结构由单个(连续)值描述,该值不限于传统的α-螺旋、310螺旋和β-链类别。后一个特性允许没有“经典”二级结构的结构被编码为可用于比较算法的线段。当对大量同源蛋白质对进行比较时,发现当前方法比基于氢键的广泛使用的方法稍微更一致。