Bray Nicolas, Pachter Lior
Department of Mathematics, University of California at Berkeley, Berkeley, California 94720, USA.
Genome Res. 2004 Apr;14(4):693-9. doi: 10.1101/gr.1960404.
We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein-based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region, which consists of 1.8 Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments, an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse, and rat genomes.
我们描述了一种新的全局多重比对程序,它能够对大量基因组区域进行比对。我们的渐进比对方法融合了以下理念:祖先序列的最大似然推断、自动引导树构建、基于蛋白质的从头基因预测锚定以及从序列全局同源性图谱得出的约束条件。我们已在MAVID程序中实现了这些理念,该程序能够准确比对长达兆碱基的多个基因组区域。MAVID能够有效地比对分歧序列以及不完整的未完成序列。我们在由1.8兆碱基的人类序列以及有袋类动物、鸟类、鱼类和哺乳动物中的20个直系同源区域组成的基准CFTR区域上展示了该程序的能力。最后,我们描述了两个大型的MAVID比对,一个是所有可用HIV基因组的比对,另一个是整个人类、小鼠和大鼠基因组的多重比对。