Lee Christopher, Grasso Catherine, Sharlow Mark F
Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095-1570, USA.
Bioinformatics. 2002 Mar;18(3):452-64. doi: 10.1093/bioinformatics/18.3.452.
Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts.
We present a graph representation of an MSA that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MSA to a profile. This enables our algorithm (Partial Order Alignment (POA)) to guarantee that the optimal alignment of each new sequence versus each sequence in the MSA will be considered. Moreover, this algorithm introduces a new edit operator, homologous recombination, important for multidomain sequences. The algorithm has improved speed (linear time complexity) over existing MSA algorithms, enabling construction of massive and complex alignments (e.g. an alignment of 5000 sequences in 4 h on a Pentium II). We demonstrate the utility of this algorithm on a family of multidomain SH2 proteins, and on EST assemblies containing alternative splicing and polymorphism.
The partial order alignment program POA is available at http://www.bioinformatics.ucla.edu/poa.
渐进式多序列比对(MSA)方法在每个比对步骤中都依赖于将多序列比对简化为线性轮廓。然而,这会导致丢失准确比对所需的信息以及空位计分假象。
我们提出了一种多序列比对的图形表示,它本身可以通过成对动态规划直接进行比对,从而无需将多序列比对简化为轮廓。这使得我们的算法(偏序比对(POA))能够确保考虑每个新序列与多序列比对中每个序列的最优比对。此外,该算法引入了一种新的编辑操作符——同源重组,这对于多结构域序列很重要。与现有的多序列比对算法相比,该算法具有更高的速度(线性时间复杂度),能够构建大规模且复杂的比对(例如,在奔腾II处理器上4小时内完成5000个序列的比对)。我们在一个多结构域SH2蛋白家族以及包含可变剪接和多态性的EST组装上展示了该算法的实用性。
偏序比对程序POA可在http://www.bioinformatics.ucla.edu/poa获取。