Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea.
School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea.
PLoS One. 2019 Jan 30;14(1):e0210177. doi: 10.1371/journal.pone.0210177. eCollection 2019.
Protein structure alignment is an important tool for studying evolutionary biology and protein modeling. A tool which intensively searches for the globally optimal non-sequential alignments is rarely found. We propose ALIGN-CSA which shows improvement in scores, such as DALI-score, SP-score, SO-score and TM-score over the benchmark set including 286 cases. We performed benchmarking of existing popular alignment scoring functions, where the dependence of the search algorithm was effectively eliminated by using ALIGN-CSA. For the benchmarking, we set the minimum block size to 4 to prevent much fragmented alignments where the biological relevance of small alignment blocks is hard to interpret. With this condition, globally optimal alignments were searched by ALIGN-CSA using the four scoring functions listed above, and TM-score is found to be the most effective in generating alignments with longer match lengths and smaller RMSD values. However, DALI-score is the most effective in generating alignments similar to the manually curated reference alignments, which implies that DALI-score is more biologically relevant score. Due to the high demand on computational resources of ALIGN-CSA, we also propose a relatively fast local refinement method, which can control the minimum block size and whether to allow the reverse alignment. ALIGN-CSA can be used to obtain much improved alignment at the cost of relatively more extensive computation. For faster alignment, we propose a refinement protocol that improves the score of a given alignment obtained by various external tools. All programs are available from http://lee.kias.re.kr.
蛋白质结构比对是研究进化生物学和蛋白质建模的重要工具。很少有工具能密集地寻找全局最优的非序列比对。我们提出了 ALIGN-CSA,它在 DALI 得分、SP 得分、SO 得分和 TM 得分等指标上都优于包括 286 个案例的基准集。我们对现有的流行对齐评分函数进行了基准测试,通过使用 ALIGN-CSA 有效地消除了搜索算法的依赖性。对于基准测试,我们将最小块大小设置为 4,以防止出现过多碎片化的比对,因为小比对块的生物学相关性很难解释。在这种情况下,我们使用上述四个评分函数通过 ALIGN-CSA 搜索全局最优的比对,发现 TM 得分在生成更长匹配长度和更小 RMSD 值的比对方面最有效。然而,DALI 得分在生成与人工编辑的参考比对相似的比对方面最有效,这意味着 DALI 得分更具有生物学相关性。由于 ALIGN-CSA 对计算资源的要求很高,我们还提出了一种相对较快的局部细化方法,可以控制最小块大小和是否允许反向比对。ALIGN-CSA 可以以相对更多的计算为代价获得改进很多的比对。对于更快的比对,我们提出了一种细化协议,可以提高各种外部工具获得的给定比对的得分。所有程序都可以从 http://lee.kias.re.kr 获取。