Jiang Hui, Wong Wing Hung
Institute for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, USA.
Bioinformatics. 2008 Oct 15;24(20):2395-6. doi: 10.1093/bioinformatics/btn429. Epub 2008 Aug 12.
SeqMap is a tool for mapping large amount of short sequences to the genome. It is designed for finding all the places in a reference genome where each sequence may come from. This task is essential to the analysis of data from ultra high-throughput sequencing machines. With a carefully designed index-filtering algorithm and an efficient implementation, SeqMap can map tens of millions of short sequences to a genome of several billions of nucleotides. Multiple substitutions and insertions/deletions of the nucleotide bases in the sequences can be tolerated and therefore detected. SeqMap supports FASTA input format and various output formats, and provides command line options for tuning almost every aspect of the mapping process. A typical mapping can be done in a few hours on a desktop PC. Parallel use of SeqMap on a cluster is also very straightforward.
SeqMap是一种用于将大量短序列映射到基因组的工具。它旨在找出参考基因组中每个序列可能来自的所有位置。这项任务对于分析来自超高通量测序仪的数据至关重要。通过精心设计的索引过滤算法和高效的实现方式,SeqMap能够将数千万条短序列映射到数十亿个核苷酸的基因组上。序列中核苷酸碱基的多个替换以及插入/缺失均可被容忍并因此被检测到。SeqMap支持FASTA输入格式和各种输出格式,并提供命令行选项以调整映射过程的几乎每个方面。在台式计算机上,典型的映射操作几小时内即可完成。在集群上并行使用SeqMap也非常简单。