Feng Z K, Sippl M J
Center for Applied Molecular Engineering, University of Salzburg, Australia.
Fold Des. 1996;1(2):123-32. doi: 10.1016/s1359-0278(96)00021-1.
Techniques for comparison and optimum superimposition of protein structures are indispensable tools, providing the basis for statistical analysis, modeling, prediction and classification of protein folds. Observed similarity of structures is frequently interpreted as an indication of evolutionary relatedness. A variety of advanced techniques are available, but so far the important issue of uniqueness of structural superimposition has been largely neglected. We set out to investigate this issue by implementing an efficient algorithm for structure superimposition enabling routine searches for alternative alignments.
The algorithm is based on optimum superimposition of structures and dynamic programming. The implementation is tested and validated using published results. In particular, an automatic classification of all protein folds in a recent release of the protein data bank is performed. The results obtained are closely related to published data. Surprisingly, for many protein pairs alternative alignments are obtained. These alignments are indistinguishable in terms of number of equivalent residues and root mean square error of superimposition, but the respective sets of equivalent residue pairs are completely distinct. Alternative alignments are observed for all protein architectures, including mixed alpha/beta folds.
Superimposition of protein folds is frequently ambiguous. This has several implications on the interpretation of structural similarity with respect to evolutionary relatedness and it restricts the range of applicability of superimposed structures in statistical analysis. In particular, studies based on the implicit assumption that optimum superimposition of structures is unique are bound to be misleading.
蛋白质结构的比较和最佳叠合技术是不可或缺的工具,为蛋白质折叠的统计分析、建模、预测和分类提供了基础。观察到的结构相似性常常被解释为进化相关性的指标。虽然有多种先进技术可用,但到目前为止,结构叠合唯一性这个重要问题在很大程度上被忽视了。我们着手通过实现一种高效的结构叠合算法来研究这个问题,该算法能够常规搜索替代比对。
该算法基于结构的最佳叠合和动态规划。利用已发表的结果对该实现进行了测试和验证。特别是,对蛋白质数据库最新版本中的所有蛋白质折叠进行了自动分类。所获得的结果与已发表的数据密切相关。令人惊讶的是,对于许多蛋白质对都获得了替代比对。这些比对在等效残基数量和叠合的均方根误差方面无法区分,但各自的等效残基对集合却完全不同。在所有蛋白质结构中都观察到了替代比对,包括混合的α/β折叠。
蛋白质折叠的叠合常常是不明确的。这对关于进化相关性的结构相似性解释有若干影响,并且限制了叠合结构在统计分析中的适用范围。特别是,基于结构的最佳叠合是唯一的这一隐含假设的研究必然会产生误导。