Blanchette Mathieu, Kent W James, Riemer Cathy, Elnitski Laura, Smit Arian F A, Roskin Krishna M, Baertsch Robert, Rosenbloom Kate, Clawson Hiram, Green Eric D, Haussler David, Miller Webb
Howard Hughes Medical Institute, University of California at Santa Cruz, Santa Cruz, California 95064, USA.
Genome Res. 2004 Apr;14(4):708-15. doi: 10.1101/gr.1933104.
We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.
我们定义了一种“线程化块集”,它是经典多重比对概念的一种新颖推广。一个名为TBA(“线程化块集比对器”)的新计算机程序在给定序列中所有匹配片段以相同顺序和方向出现的假设下构建线程化块集;不处理倒位和重复情况。TBA设计用于比对多个哺乳动物基因组中许多(但绝非全部)兆碱基大小的区域。TBA的输出可以投影到任何选定作为参考的基因组上,从而确保不同投影对哪些基因组位置是直系同源的给出一致预测。使用一种新的可视化工具从哺乳动物和鱼类的角度查看TBA生成的脊椎动物Hox簇比对,展示了这种能力。使用一个模拟基因组序列进化变化的程序对齐对质量进行实验评估,结果表明TBA比早期程序更准确。为了执行动态规划比对步骤,TBA运行一个名为MULTIZ的独立程序,该程序可用于比对高度重排或测序不完整的基因组。我们描述了我们在圣克鲁斯基因组浏览器中使用MULTIZ来生成全基因组多重比对的情况。