Kent W James
Department of Biology and Center for Molecular Biology of RNA, University of California-Santa Cruz, Santa Cruz, CA 95064, USA.
Genome Res. 2002 Apr;12(4):656-64. doi: 10.1101/gr.229202.
Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. BLAT's speed stems from an index of all nonoverlapping K-mers in the genome. This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly. BLAT has several major stages. It uses the index to find regions in the genome likely to be homologous to the query sequence. It performs an alignment between homologous regions. It stitches together these aligned regions (often exons) into larger alignments (typically genes). Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible. This paper describes how BLAT was optimized. Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches. BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications. http://genome.ucsc.edu hosts a web-based BLAT server for the human genome.
分析脊椎动物基因组需要快速进行mRNA/DNA比对以及跨物种蛋白质比对。一种新工具BLAT,在进行mRNA/DNA比对时比现有的常用工具更准确,速度快500倍;在进行蛋白质比对时,在比较脊椎动物序列时通常使用的灵敏度设置下,速度快50倍。BLAT的速度源于基因组中所有非重叠K-mer的索引。该索引可装入廉价计算机的随机存取存储器(RAM)中,并且每个基因组组装只需计算一次。BLAT有几个主要阶段。它使用索引在基因组中找到可能与查询序列同源的区域。它在同源区域之间进行比对。它将这些比对区域(通常是外显子)拼接成更大的比对(通常是基因)。最后,BLAT重新检查可能在第一阶段遗漏的小内部外显子,并在可行的情况下调整具有典型剪接位点的大缺口边界。本文描述了BLAT是如何优化的。探讨了各种K-mer大小、错配方案和所需索引匹配数对速度和灵敏度的影响。在各种测试集上,将BLAT与其他比对程序进行了比较,然后将其用于多个全基因组应用中。http://genome.ucsc.edu托管了一个针对人类基因组的基于网络的BLAT服务器。