长读 RNA-seq 剪接感知比对工具评估。

Evaluation of tools for long read RNA-seq splice-aware alignment.

机构信息

Department of Electronic Systems and Information Processing, Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia.

Département d'Ecologie et d'Evolution, Université de Lausanne, Quartier Sorge, 1015 Lausanne, Switzerland.

出版信息

Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.

DOI:10.1093/bioinformatics/btx668

PMID:29069314

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6192213/

Abstract

MOTIVATION

High-throughput sequencing has transformed the study of gene expression levels through RNA-seq, a technique that is now routinely used by various fields, such as genetic research or diagnostics. The advent of third generation sequencing technologies providing significantly longer reads opens up new possibilities. However, the high error rates common to these technologies set new bioinformatics challenges for the gapped alignment of reads to their genomic origin. In this study, we have explored how currently available RNA-seq splice-aware alignment tools cope with increased read lengths and error rates. All tested tools were initially developed for short NGS reads, but some have claimed support for long Pacific Biosciences (PacBio) or even Oxford Nanopore Technologies (ONT) MinION reads.

RESULTS

The tools were tested on synthetic and real datasets from two technologies (PacBio and ONT MinION). Alignment quality and resource usage were compared across different aligners. The effect of error correction of long reads was explored, both using self-correction and correction with an external short reads dataset. A tool was developed for evaluating RNA-seq alignment results. This tool can be used to compare the alignment of simulated reads to their genomic origin, or to compare the alignment of real reads to a set of annotated transcripts. Our tests show that while some RNA-seq aligners were unable to cope with long error-prone reads, others produced overall good results. We further show that alignment accuracy can be improved using error-corrected reads.

AVAILABILITY AND IMPLEMENTATION

https://github.com/kkrizanovic/RNAseqEval, https://figshare.com/projects/RNAseq_benchmark/24391.

CONTACT

mile.sikic@fer.hr.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量测序通过 RNA-seq 改变了基因表达水平的研究，该技术现在已被遗传研究或诊断等各个领域常规使用。提供更长读长的第三代测序技术的出现开辟了新的可能性。然而，这些技术常见的高错误率为读取与基因组起源的缺口对齐提出了新的生物信息学挑战。在这项研究中，我们探讨了当前可用的 RNA-seq 剪接感知对齐工具如何应对增加的读长和错误率。所有测试的工具最初都是为短 NGS 读取开发的，但有些声称支持长 Pacific Biosciences (PacBio) 甚至 Oxford Nanopore Technologies (ONT) MinION 读取。

结果

该工具在两种技术（PacBio 和 ONT MinION）的合成和真实数据集上进行了测试。比较了不同对齐器的对齐质量和资源使用情况。探讨了使用自纠错和使用外部短读取数据集进行纠错对长读取的影响。开发了一种用于评估 RNA-seq 对齐结果的工具。该工具可用于比较模拟读取与其基因组起源的对齐，或比较真实读取与一组注释转录本的对齐。我们的测试表明，虽然一些 RNA-seq 对齐器无法处理长易错读取，但其他对齐器总体上产生了良好的结果。我们进一步表明，使用纠错后的读取可以提高对齐准确性。

可用性和实现

https://github.com/kkrizanovic/RNAseqEval，https://figshare.com/projects/RNAseq_benchmark/24391.

联系人

mile.sikic@fer.hr.

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9625/6192213/5ee96ca5eeec/btx668f1.jpg

相似文献

Evaluation of tools for long read RNA-seq splice-aware alignment.

Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.

LRCstats, a tool for evaluating long reads correction methods.

Bioinformatics. 2017 Nov 15;33(22):3652-3654. doi: 10.1093/bioinformatics/btx489.

ASElux: an ultra-fast and accurate allelic reads counter.

Bioinformatics. 2018 Apr 15;34(8):1313-1320. doi: 10.1093/bioinformatics/btx762.

Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.

Genes (Basel). 2019 Jan 14;10(1):44. doi: 10.3390/genes10010044.

Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads.

Bioinformatics. 2016 Sep 1;32(17):2582-9. doi: 10.1093/bioinformatics/btw237. Epub 2016 May 9.

LAMSA: fast split read alignment with long approximate matches.

Bioinformatics. 2017 Jan 15;33(2):192-201. doi: 10.1093/bioinformatics/btw594. Epub 2016 Sep 25.

Arioc: GPU-accelerated alignment of short bisulfite-treated reads.

Bioinformatics. 2018 Aug 1;34(15):2673-2675. doi: 10.1093/bioinformatics/bty167.

Minimap2: pairwise alignment for nucleotide sequences.

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.

Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.

引用本文的文献

Metformin-Enhanced Secretome from Periodontal Ligament Stem Cells Promotes Functional Recovery in an Inflamed Periodontal Model: In Vitro Study.

J Funct Biomater. 2025 May 13;16(5):177. doi: 10.3390/jfb16050177.

Notable challenges posed by long-read sequencing for the study of transcriptional diversity and genome annotation.

Genome Res. 2025 Apr 14;35(4):583-592. doi: 10.1101/gr.279865.124.

Alternative splicing of modulatory immune receptors in T lymphocytes: a newly identified and targetable mechanism for anticancer immunotherapy.

Front Immunol. 2025 Jan 7;15:1490035. doi: 10.3389/fimmu.2024.1490035. eCollection 2024.

Advances in long-read single-cell transcriptomics.

Hum Genet. 2024 Oct;143(9-10):1005-1020. doi: 10.1007/s00439-024-02678-x. Epub 2024 May 24.

Computational tools for plant genomics and breeding.

Sci China Life Sci. 2024 Aug;67(8):1579-1590. doi: 10.1007/s11427-024-2578-6. Epub 2024 Apr 23.

SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark.

Genome Biol. 2023 Dec 11;24(1):286. doi: 10.1186/s13059-023-03127-0.

HQAlign: aligning nanopore reads for SV detection using current-level modeling.

Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad580.

SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark.

bioRxiv. 2023 Aug 24:2023.08.23.554392. doi: 10.1101/2023.08.23.554392.

Nanopore Direct RNA Sequencing Data Processing and Analysis Using MasterOfPores.

Methods Mol Biol. 2023;2624:185-205. doi: 10.1007/978-1-0716-2962-8_13.

HQAlign: Aligning nanopore reads for SV detection using current-level modeling.

ArXiv. 2023 Jan 10:arXiv:2301.03834v1.

本文引用的文献

NanoSim: nanopore sequence read simulator based on statistical characterization.

Gigascience. 2017 Apr 1;6(4):1-6. doi: 10.1093/gigascience/gix010.

Fast and accurate de novo genome assembly from long uncorrected reads.

Genome Res. 2017 May;27(5):737-746. doi: 10.1101/gr.214270.116. Epub 2017 Jan 18.

Simulation-based comprehensive benchmarking of RNA-seq aligners.

Nat Methods. 2017 Feb;14(2):135-139. doi: 10.1038/nmeth.4106. Epub 2016 Dec 12.

Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads.

Bioinformatics. 2016 Sep 1;32(17):2582-9. doi: 10.1093/bioinformatics/btw237. Epub 2016 May 9.

GMAP and GSNAP for Genomic Sequence Alignment: Enhancements to Speed, Accuracy, and Functionality.

Methods Mol Biol. 2016;1418:283-334. doi: 10.1007/978-1-4939-3578-9_15.

Assessing the performance of the Oxford Nanopore Technologies MinION.

Biomol Detect Quantif. 2015 Mar;3:1-8. doi: 10.1016/j.bdq.2015.02.001.

HISAT: a fast spliced aligner with low memory requirements.

Nat Methods. 2015 Apr;12(4):357-60. doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.

Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

Nucleic Acids Res. 2015 Mar 31;43(6):e37. doi: 10.1093/nar/gku1341. Epub 2015 Jan 13.

Characterization of the human ESC transcriptome by hybrid sequencing.

Proc Natl Acad Sci U S A. 2013 Dec 10;110(50):E4821-30. doi: 10.1073/pnas.1320101110. Epub 2013 Nov 26.

Systematic evaluation of spliced alignment programs for RNA-seq data.

Nat Methods. 2013 Dec;10(12):1185-91. doi: 10.1038/nmeth.2722. Epub 2013 Nov 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

长读 RNA-seq 剪接感知比对工具评估。

Evaluation of tools for long read RNA-seq splice-aware alignment.

机构信息

Department of Electronic Systems and Information Processing, Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia.

Département d'Ecologie et d'Evolution, Université de Lausanne, Quartier Sorge, 1015 Lausanne, Switzerland.