Waldispühl Jérôme, O'Donnell Charles W, Will Sebastian, Devadas Srinivas, Backofen Rolf, Berger Bonnie
1 School of Computer Science, McGill University , Montreal, Canada .
J Comput Biol. 2014 Jul;21(7):477-91. doi: 10.1089/cmb.2013.0163. Epub 2014 Apr 25.
Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments, partiFold-Align significantly outperforms state-of-the-art pairwise and multiple sequence alignment tools in the most difficult low-sequence homology case. It also improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families (partiFold-Align is available at http://partifold.csail.mit.edu/ ).
对于低同源性蛋白质而言,精确的比较分析工具在计算生物学领域仍是一项艰巨挑战,尤其是在序列比对和共有折叠问题方面。我们提出了partiFold - Align,这是首个用于未比对蛋白质序列同时进行比对和共有折叠的算法;该算法在时间和空间上的复杂度均为多项式。从算法角度来看,partiFold - Align利用超二级结构配对和比对候选集中的稀疏性,实现了同时进行成对序列比对和折叠的有效立方运行时间。我们在跨膜β桶蛋白上展示了这些技术的有效性,跨膜β桶蛋白是一类重要但难度较大的蛋白质,已知三维结构很少。与基于结构推导的序列比对进行测试时,在最困难的低序列同源性情况下,partiFold - Align显著优于当前最先进的成对和多序列比对工具。在当前方法失效的二级结构预测方面,它也有所改进。重要的是,partiFold - Align无需预先训练。这些通用技术广泛适用于更多蛋白质家族(partiFold - Align可在http://partifold.csail.mit.edu/获取)。