Kim Daehwan, Langmead Ben, Salzberg Steven L
1] Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA. [2] Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA.
1] Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA. [2] Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA. [3] Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.
Nat Methods. 2015 Apr;12(4):357-60. doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.
HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.
HISAT(转录本剪接比对的分层索引)是一种用于比对RNA测序实验读数的高效系统。HISAT使用基于Burrows-Wheeler变换和Ferragina-Manzini(FM)索引的索引方案,采用两种类型的索引进行比对:一个全基因组FM索引用于锚定每个比对,以及大量局部FM索引用于这些比对的非常快速的扩展。HISAT针对人类基因组的分层索引包含48,000个局部FM索引,每个索引代表约64,000 bp的基因组区域。对真实和模拟数据集的测试表明,HISAT是目前可用的最快系统,其准确性与任何其他方法相当或更好。尽管其索引数量众多,但HISAT仅需要4.3千兆字节的内存。HISAT支持任何大小的基因组,包括那些大于40亿碱基的基因组。