Electrical & Computer Engineering, University of California, Los Angeles, CA 90095, USA.
Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA.
Bioinformatics. 2021 May 5;37(5):625-633. doi: 10.1093/bioinformatics/btaa875.
Efficient and accurate alignment of DNA/RNA sequence reads to each other or to a reference genome/transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this article, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome/transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner.
We show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2, 2.5 and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets.
https://github.com/joshidhaivat/QAlign.git.
Supplementary data are available at Bioinformatics online.
高效准确地将 DNA/RNA 序列读取彼此对齐或与参考基因组/转录组对齐,是基因组分析中的一个重要问题。纳米孔测序技术已经成为一种主要的测序技术,许多长读长序列比对器已经被设计用于对齐纳米孔读段。然而,高错误率使得准确和高效的比对变得困难。正确利用测序过程中固有的噪声和错误特征,可以在构建稳健的比对器方面发挥重要作用。在本文中,我们设计了 QAlign,这是一种预处理器,可以与任何长读长序列比对器一起使用,用于将长读段比对到基因组/转录组或其他长读段。QAlign 的关键思想是在将核苷酸读段通过序列比对器之前,将其转换为离散的电流水平,以捕获纳米孔测序仪的错误模式。
我们表明,在将纳米孔读段比对到基因组时,QAlign 能够将对齐率从约 80%提高到 90%。我们还表明,在三个真实数据集的读段到读段对齐中,QAlign 平均提高了 9.2、2.5 和 10.8%的重叠质量。在两个真实数据集的读段到转录组对齐中,对齐率从 51.6%提高到 75.4%和 82.6%到 90%。
https://github.com/joshidhaivat/QAlign.git。
补充数据可在生物信息学在线获得。