Department of Computer Science and Engineering, Washington University, Saint Louis, MO 63130, USA.
Bioinformatics. 2012 May 15;28(10):1336-44. doi: 10.1093/bioinformatics/bts158. Epub 2012 Apr 6.
The expansion of DNA sequencing capacity has enabled the sequencing of whole genomes from a number of related species. These genomes can be combined in a multiple alignment that provides useful information about the evolutionary history at each genomic locus. One area in which evolutionary information can productively be exploited is in aligning a new sequence to a database of existing, aligned genomes. However, existing high-throughput alignment tools are not designed to work effectively with multiple genome alignments.
We introduce PhyLAT, the phylogenetic local alignment tool, to compute local alignments of a query sequence against a fixed multiple-genome alignment of closely related species. PhyLAT uses a known phylogenetic tree on the species in the multiple alignment to improve the quality of its computed alignments while also estimating the placement of the query on this tree. It combines a probabilistic approach to alignment with seeding and expansion heuristics to accelerate discovery of significant alignments. We provide evidence, using alignments of human chromosome 22 against a five-species alignment from the UCSC Genome Browser database, that PhyLAT's alignments are more accurate than those of other commonly used programs, including BLAST, POY, MAFFT, MUSCLE and CLUSTAL. PhyLAT also identifies more alignments in coding DNA than does pairwise alignment alone. Finally, our tool determines the evolutionary relationship of query sequences to the database more accurately than do POY, RAxML, EPA or pplacer.
DNA 测序能力的扩展使得对许多相关物种的全基因组进行测序成为可能。这些基因组可以组合在一个多重比对中,提供关于每个基因组位置进化历史的有用信息。可以有效地利用进化信息的一个领域是将新序列与现有对齐基因组数据库进行对齐。然而,现有的高通量对齐工具并不是为有效地处理多个基因组比对而设计的。
我们引入了 PhyLAT,这是一种用于计算查询序列与密切相关物种的固定多重基因组比对之间局部比对的系统。PhyLAT 使用已知的物种系统发育树来改进其计算比对的质量,同时估计查询序列在该树上的位置。它将基于概率的比对方法与种子和扩展启发式算法相结合,以加速发现有意义的比对。我们使用人类染色体 22 与 UCSC 基因组浏览器数据库中的五个物种比对进行了验证,证明 PhyLAT 的比对比其他常用程序(包括 BLAST、POY、MAFFT、MUSCLE 和 CLUSTAL)更准确。PhyLAT 还在编码 DNA 中比仅进行两两比对识别出更多的比对。最后,我们的工具比 POY、RAxML、EPA 或 pplacer 更准确地确定查询序列与数据库之间的进化关系。