National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA.
BMC Bioinformatics. 2019 Jul 25;20(1):405. doi: 10.1186/s12859-019-2996-x.
Next-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline.
Magic-BLAST uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome.
We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI.
下一代测序技术可以从转录本或基因组中产生数以千万计的读段,通常是成对的。但是,很少有程序可以在基因组上对 RNA 进行比对,并准确地发现内含子,尤其是对于长读段而言。我们介绍了 Magic-BLAST,这是一种基于 Magic 管道思路的新比对器。
Magic-BLAST 使用了一些创新技术,包括优化拼接比对得分和在种子选择期间进行选择性屏蔽。我们评估了 Magic-BLAST 准确映射短序列或长序列的性能,以及在 PacBio、Roche 和 Illumina 运行以及六个基准测试上发现内含子的能力,并将其与其他流行的比对器进行了比较。此外,我们还研究了与基因组完全匹配的人类理想化 RefSeq mRNA 序列的比对。
我们表明,Magic-BLAST 在广泛的条件下在发现内含子方面表现最佳,在映射长度超过 250 个碱基的读段方面表现最佳,无论来自哪个平台。它具有很强的通用性和鲁棒性,可以处理高错配率或极端碱基组成,并且速度合理。它可以将读段与 BLAST 数据库或 FASTA 文件对齐。它可以接受 FASTQ 文件作为输入,也可以自动从 NCBI 的 SRA 存储库中检索访问号。