Mirarab Siavash, Nguyen Nam, Guo Sheng, Wang Li-San, Kim Junhyong, Warnow Tandy
1 Department of Computer Science, University of Texas at Austin , Austin, Texas.
J Comput Biol. 2015 May;22(5):377-86. doi: 10.1089/cmb.2014.0156. Epub 2014 Dec 30.
We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.
我们介绍了PASTA,一种新的多序列比对算法。PASTA使用一种新技术,在给定引导树的情况下生成比对,这使其既能实现高度可扩展性,又能非常精确。我们对多达20万条序列的生物学数据和模拟数据进行了一项研究,结果表明PASTA生成的比对高度精确,在准确性和可扩展性方面优于领先的比对方法(包括SATé)。我们还表明,基于PASTA比对估计的树非常精确——略优于SATé树,但相对于其他方法有显著改进。最后,PASTA比SATé更快,具有高度可并行性,并且所需内存相对较少。