Brudno Michael, Do Chuong B, Cooper Gregory M, Kim Michael F, Davydov Eugene, Green Eric D, Sidow Arend, Batzoglou Serafim
Department of Computer Science, Stanford University, Stanford, California 94305-9010, USA.
Genome Res. 2003 Apr;13(4):721-31. doi: 10.1101/gr.926603. Epub 2003 Mar 12.
To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. We present LAGAN, a system for rapid global alignment of two homologous genomic sequences, and Multi-LAGAN, a system for multiple global alignment of genomic sequences. We tested our systems on a data set consisting of greater than 12 Mb of high-quality sequence from 12 vertebrate species. All the sequence was derived from the genomic region orthologous to an approximately 1.5-Mb region on human chromosome 7q31.3. We found that both LAGAN and Multi-LAGAN compare favorably with other leading alignment methods in correctly aligning protein-coding exons, especially between distant homologs such as human and chicken, or human and fugu. Multi-LAGAN produced the most accurate alignments, while requiring just 75 minutes on a personal computer to obtain the multiple alignment of all 12 sequences. Multi-LAGAN is a practical method for generating multiple alignments of long genomic sequences at any evolutionary distance. Our systems are publicly available at http://lagan.stanford.edu.
为了比较不同物种的全基因组,生物学家越来越需要高效到足以处理长序列且准确到足以正确比对远缘物种间保守生物学特征的比对方法。我们展示了LAGAN,一种用于两条同源基因组序列快速全局比对的系统,以及Multi-LAGAN,一种用于基因组序列多重全局比对的系统。我们在一个由来自12种脊椎动物的超过12 Mb高质量序列组成的数据集上测试了我们的系统。所有序列均来自与人类7号染色体7q31.3上一个约1.5 Mb区域直系同源的基因组区域。我们发现,在正确比对蛋白质编码外显子方面,LAGAN和Multi-LAGAN都优于其他领先的比对方法,尤其是在人类与鸡或人类与河豚等远缘同源物之间。Multi-LAGAN产生了最准确的比对结果,同时在个人计算机上仅需75分钟就能获得所有12条序列的多重比对。Multi-LAGAN是一种在任何进化距离下生成长期基因组序列多重比对的实用方法。我们的系统可在http://lagan.stanford.edu上公开获取。