Mayr Gabriele, Domingues Francisco S, Lackner Peter
Department of Molecular Biology, University of Salzburg, Salzburg, Austria.
BMC Struct Biol. 2007 Jul 26;7:50. doi: 10.1186/1472-6807-7-50.
Several methods are currently available for the comparison of protein structures. These methods have been analysed regarding the performance in the identification of structurally/evolutionary related proteins, but so far there has been less focus on the objective comparison between the alignments produced by different methods.
We analysed and compared the structural alignments obtained by different methods using three sets of pairs of structurally related proteins. The first set corresponds to 355 pairs of remote homologous proteins according to the SCOP database (ASTRAL40 set). The second set was derived from the SISYPHUS database and includes 69 protein pairs (SISY set). The third set consists of 40 pairs that are challenging to align (RIPC set). The alignment of pairs of this set requires indels of considerable number and size and some of the proteins are related by circular permutations, show extensive conformational variability or include repetitions. Two standard methods (CE and DALI) were applied to align the proteins in the ASTRAL40 set. The extent of structural similarity identified by both methods is highly correlated and the alignments from the two methods agree on average in more than half of the aligned positions. CE, DALI, as well as four additional methods (FATCAT, MATRAS, Calpha-match and SHEBA) were then compared using the SISY and RIPC sets. The accuracy of the alignments was assessed by comparison to reference alignments. The alignments generated by the different methods on average match more than half of the reference alignments in the SISY set. The alignments obtained in the more challenging RIPC set tend to differ considerably and match reference alignments less successfully than the SISY set alignments.
The alignments produced by different methods tend to agree to a considerable extent, but the agreement is lower for the more challenging pairs. The results for the comparison to reference alignments are encouraging, but also indicate that there is still room for improvement.
目前有多种方法可用于比较蛋白质结构。这些方法已针对在识别结构/进化相关蛋白质方面的性能进行了分析,但迄今为止,对于不同方法产生的比对结果之间的客观比较关注较少。
我们使用三组结构相关蛋白质对分析并比较了不同方法获得的结构比对。第一组对应于根据SCOP数据库(ASTRAL40集)的355对远源同源蛋白质。第二组来自SISYPHUS数据库,包括69对蛋白质(SISY集)。第三组由40对难以比对的蛋白质组成(RIPC集)。该组蛋白质对的比对需要相当数量和大小的插入缺失,并且一些蛋白质通过环状排列相关,表现出广泛的构象变异性或包含重复序列。应用两种标准方法(CE和DALI)比对ASTRAL40集中的蛋白质。两种方法识别的结构相似程度高度相关,并且两种方法的比对结果在超过一半的比对位置上平均是一致的。然后使用SISY和RIPC集比较了CE、DALI以及另外四种方法(FATCAT、MATRAS、Cα匹配和SHEBA)。通过与参考比对进行比较来评估比对的准确性。不同方法生成的比对结果在SISY集中平均与超过一半的参考比对匹配。在更具挑战性的RIPC集中获得的比对结果往往差异很大,并且与SISY集比对相比,与参考比对的匹配成功率较低。
不同方法产生的比对结果在很大程度上趋于一致,但对于更具挑战性的蛋白质对,一致性较低。与参考比对的比较结果令人鼓舞,但也表明仍有改进的空间。