Suppr超能文献

QAlign:使用电流水平建模准确对齐纳米孔读数。

QAlign: aligning nanopore reads accurately using current-level modeling.

机构信息

Electrical & Computer Engineering, University of California, Los Angeles, CA 90095, USA.

Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA.

出版信息

Bioinformatics. 2021 May 5;37(5):625-633. doi: 10.1093/bioinformatics/btaa875.

Abstract

MOTIVATION

Efficient and accurate alignment of DNA/RNA sequence reads to each other or to a reference genome/transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this article, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome/transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner.

RESULTS

We show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2, 2.5 and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets.

AVAILABILITY AND IMPLEMENTATION

https://github.com/joshidhaivat/QAlign.git.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高效准确地将 DNA/RNA 序列读取彼此对齐或与参考基因组/转录组对齐,是基因组分析中的一个重要问题。纳米孔测序技术已经成为一种主要的测序技术,许多长读长序列比对器已经被设计用于对齐纳米孔读段。然而,高错误率使得准确和高效的比对变得困难。正确利用测序过程中固有的噪声和错误特征,可以在构建稳健的比对器方面发挥重要作用。在本文中,我们设计了 QAlign,这是一种预处理器,可以与任何长读长序列比对器一起使用,用于将长读段比对到基因组/转录组或其他长读段。QAlign 的关键思想是在将核苷酸读段通过序列比对器之前,将其转换为离散的电流水平,以捕获纳米孔测序仪的错误模式。

结果

我们表明,在将纳米孔读段比对到基因组时,QAlign 能够将对齐率从约 80%提高到 90%。我们还表明,在三个真实数据集的读段到读段对齐中,QAlign 平均提高了 9.2、2.5 和 10.8%的重叠质量。在两个真实数据集的读段到转录组对齐中,对齐率从 51.6%提高到 75.4%和 82.6%到 90%。

可用性和实现

https://github.com/joshidhaivat/QAlign.git。

补充信息

补充数据可在生物信息学在线获得。

相似文献

6
Evaluation of tools for long read RNA-seq splice-aware alignment.长读 RNA-seq 剪接感知比对工具评估。
Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.

本文引用的文献

2
The UCSC Genome Browser database: 2019 update.UCSC 基因组浏览器数据库:2019 年更新。
Nucleic Acids Res. 2019 Jan 8;47(D1):D853-D858. doi: 10.1093/nar/gky1095.
3
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
4
DeepSimulator: a deep simulator for Nanopore sequencing.深模拟器:一种用于纳米孔测序的深度模拟器。
Bioinformatics. 2018 Sep 1;34(17):2899-2908. doi: 10.1093/bioinformatics/bty223.
6
Completing bacterial genome assemblies with multiplex MinION sequencing.使用多重 MinION 测序完成细菌基因组组装。
Microb Genom. 2017 Sep 14;3(10):e000132. doi: 10.1099/mgen.0.000132. eCollection 2017 Oct.
8
Evaluation of tools for long read RNA-seq splice-aware alignment.长读 RNA-seq 剪接感知比对工具评估。
Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.
9
Resolving multicopy duplications using polyploid phasing.使用多倍体定相解析多拷贝重复。
Res Comput Mol Biol. 2017 May;10229:117-133. doi: 10.1007/978-3-319-56970-3_8. Epub 2017 Apr 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验