RNA-Seq 比对算法与 RNA-Seq 统一映射器（RUM）的比较分析。

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).

机构信息

Penn Center for Bioinformatics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.

出版信息

Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19.

DOI:10.1093/bioinformatics/btr427

PMID:21775302

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3167048/

Abstract

MOTIVATION

A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously.

RESULTS

We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription-polymerase chain reaction (RT-PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability.

AVAILABILITY

The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (http://cbil.upenn.edu/RUM).

CONTACT

ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu

SUPPLEMENTARY INFORMATION

The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.

摘要

动机

高通量测序中的一个关键任务是将数百万个短读段与参考基因组对齐。由于 RNA 剪接，RNA 测序 (RNA-Seq) 的对齐尤其复杂。有许多 RNA-Seq 算法可用，并且声称在检测剪接接头的同时具有高精度和高效率的读对齐。RNA-Seq 数据本质上是离散的；因此，在具有合理的基因模型和比较指标的情况下，可以对 RNA-Seq 数据进行模拟，以达到足够的准确性，从而能够对对齐算法进行有意义的基准测试。以前没有进行过严格比较所有可行的已发表 RNA-Seq 算法的工作。

结果

我们开发了一种 RNA-Seq 模拟器，该模拟器可模拟 RNA 对齐的主要障碍，包括可变剪接、插入、缺失、替换、测序错误和内含子信号。我们使用此模拟器来衡量可用算法在碱基和接头级别上的准确性和鲁棒性。此外，我们使用逆转录-聚合酶链反应 (RT-PCR) 和 Sanger 测序来验证算法在检测新型转录物特征（如小鼠视网膜 RNA-Seq 数据中的新型外显子和可变剪接）方面的能力。开发了一个基于 BLAT 的管道来探索针对此问题的现有工具的性能，并将其与最近开发的方法进行比较。这个名为 RNA-Seq 统一映射器 (RUM) 的管道与当前最好的对齐器表现相当，并提供了准确性、速度和可用性的优势组合。

可用性

RUM 管道通过 Amazon Cloud 分发，并可在使用 Sun Grid Engine（http://cbil.upenn.edu/RUM）的计算集群上使用。

联系方式

ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu

补充信息

文章中描述的 RNA-Seq 序列读取已存储在 GEO 中，访问号为 GSE26248。

相似文献

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).RNA-Seq 比对算法与 RNA-Seq 统一映射器（RUM）的比较分析。

Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19.

TopHat: discovering splice junctions with RNA-Seq.TopHat：利用RNA测序发现剪接接头

Bioinformatics. 2009 May 1;25(9):1105-11. doi: 10.1093/bioinformatics/btp120. Epub 2009 Mar 16.

PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data.PASSion：一种基于模式生长算法的 RNA-Seq 数据拼接 junction 检测的流水线。

Bioinformatics. 2012 Feb 15;28(4):479-86. doi: 10.1093/bioinformatics/btr712. Epub 2012 Jan 4.

JAGuaR: junction alignments to genome for RNA-seq reads.JAGuaR：用于RNA测序读数与基因组的接头比对。

PLoS One. 2014 Jul 25;9(7):e102398. doi: 10.1371/journal.pone.0102398. eCollection 2014.

STAR: ultrafast universal RNA-seq aligner.STAR：超快通用 RNA-seq 对齐工具。

Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.

Systematic evaluation of spliced alignment programs for RNA-seq data.系统评估 RNA-seq 数据拼接比对程序。

Nat Methods. 2013 Dec;10(12):1185-91. doi: 10.1038/nmeth.2722. Epub 2013 Nov 3.

Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data.用于从RNA测序数据中确定和定量全长mRNA剪接形式的算法的基准分析。

Bioinformatics. 2015 Dec 15;31(24):3938-45. doi: 10.1093/bioinformatics/btv488. Epub 2015 Sep 3.

Simulation-based comprehensive benchmarking of RNA-seq aligners.基于模拟的RNA测序比对工具综合基准测试

Nat Methods. 2017 Feb;14(2):135-139. doi: 10.1038/nmeth.4106. Epub 2016 Dec 12.

Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach.从 RNA-seq 比对中识别新的剪接接头：一种深度学习方法。

BMC Genomics. 2018 Dec 27;19(1):971. doi: 10.1186/s12864-018-5350-1.

Supersplat--spliced RNA-seq alignment.超拼接--拼接 RNA-seq 比对。

Bioinformatics. 2010 Jun 15;26(12):1500-5. doi: 10.1093/bioinformatics/btq206. Epub 2010 Apr 21.

引用本文的文献

MAJIQ-CLIN: A novel tool for the identification of Mendelian disease-causing variants from RNA-Seq data.MAJIQ-CLIN：一种从RNA测序数据中识别孟德尔疾病致病变异的新型工具。

medRxiv. 2025 Feb 2:2025.01.30.25321185. doi: 10.1101/2025.01.30.25321185.

Increased [F]FDG uptake in the infarcted myocardial area displayed by combined PET/CMR correlates with snRNA-seq-detected inflammatory cell invasion.联合 PET/CMR 显示的梗死心肌区域中 [F]FDG 摄取增加与 snRNA-seq 检测到的炎症细胞浸润相关。

Basic Res Cardiol. 2024 Oct;119(5):807-829. doi: 10.1007/s00395-024-01064-y. Epub 2024 Jun 26.

Splice_sim: a nucleotide conversion-enabled RNA-seq simulation and evaluation framework.Splice_sim：一种支持核苷酸转换的 RNA-seq 模拟和评估框架。

Genome Biol. 2024 Jun 25;25(1):166. doi: 10.1186/s13059-024-03313-8.

PxBLAT: an efficient python binding library for BLAT.PxBLAT：BLAT 的高效 Python 绑定库。

BMC Bioinformatics. 2024 Jun 19;25(1):219. doi: 10.1186/s12859-024-05844-0.

BEERS2: RNA-Seq simulation through high fidelity in silico modeling.BEERS2：通过高保真的计算机模拟进行 RNA-Seq 模拟。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae164.

Alternative splicing of pre-mRNA modulates the immune response in Holstein cattle naturally infected with subsp. .mRNA 前体的选择性剪接调节荷斯坦奶牛自然感染亚种后的免疫反应。

Front Immunol. 2024 Mar 1;15:1354500. doi: 10.3389/fimmu.2024.1354500. eCollection 2024.

In vivo CRISPR screening directly targeting testicular cells.在体 CRISPR 筛选直接靶向睾丸细胞。

Cell Genom. 2024 Mar 13;4(3):100510. doi: 10.1016/j.xgen.2024.100510. Epub 2024 Mar 5.

Boquila: NGS read simulator to eliminate read nucleotide bias in sequence analysis.Boquila：用于消除序列分析中读取核苷酸偏差的二代测序读段模拟器。

Turk J Biol. 2023 Feb 21;47(2):158-163. doi: 10.55730/1300-0152.2650. eCollection 2023.

Hyperglucagonaemia in diabetes: altered amino acid metabolism triggers mTORC1 activation, which drives glucagon production.糖尿病中的高胰高血糖素血症：氨基酸代谢改变触发 mTORC1 激活，进而驱动胰高血糖素的产生。

Diabetologia. 2023 Oct;66(10):1925-1942. doi: 10.1007/s00125-023-05967-8. Epub 2023 Jul 22.

A scoping review on deep learning for next-generation RNA-Seq. data analysis.深度学习在下一代 RNA-Seq 数据分析中的应用综述

Funct Integr Genomics. 2023 Apr 21;23(2):134. doi: 10.1007/s10142-023-01064-6.

本文引用的文献

Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads.Rnnotator：一种从 RNA-Seq 测序reads 中自动进行从头转录组组装的流水线。

BMC Genomics. 2010 Nov 24;11:663. doi: 10.1186/1471-2164-11-663.

HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.HMMSplicer：一种用于在 RNA-Seq 数据中高效且灵敏地发现已知和新型剪接接头的工具。

PLoS One. 2010 Nov 8;5(11):e13875. doi: 10.1371/journal.pone.0013875.

Advances in understanding cancer genomes through second-generation sequencing.通过第二代测序技术深入了解癌症基因组。

Nat Rev Genet. 2010 Oct;11(10):685-96. doi: 10.1038/nrg2841.

MLL2 is required in oocytes for bulk histone 3 lysine 4 trimethylation and transcriptional silencing.MLL2 在卵母细胞中对于组蛋白 3 赖氨酸 4 的三甲基化和转录沉默是必需的。

PLoS Biol. 2010 Aug 17;8(8):e1000453. doi: 10.1371/journal.pbio.1000453.

MapSplice: accurate mapping of RNA-seq reads for splice junction discovery.MapSplice：用于剪接位点发现的 RNA-seq 读段的精确映射。

Nucleic Acids Res. 2010 Oct;38(18):e178. doi: 10.1093/nar/gkq622. Epub 2010 Aug 27.

Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome.外显子组测序鉴定出 MLL2 突变是歌舞伎综合征的一个病因。

Nat Genet. 2010 Sep;42(9):790-3. doi: 10.1038/ng.646. Epub 2010 Aug 15.

Next generation sequencing in functional genomics.功能基因组学中的下一代测序。

Brief Bioinform. 2010 Sep;11(5):499-511. doi: 10.1093/bib/bbq018. Epub 2010 May 25.

A splice-site mutation in a retina-specific exon of BBS8 causes nonsyndromic retinitis pigmentosa.一个位于 BBS8 视网膜特异性外显子的剪接位点突变导致了非综合征性视网膜色素变性。

Am J Hum Genet. 2010 May 14;86(5):805-12. doi: 10.1016/j.ajhg.2010.04.001. Epub 2010 May 6.

Towards reliable isoform quantification using RNA-SEQ data.使用 RNA-SEQ 数据进行可靠的异构体定量。

BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-11-S3-S6.

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.通过 RNA-Seq 进行转录本组装和定量分析揭示了细胞分化过程中未注释的转录本和异构体转换。

Nat Biotechnol. 2010 May;28(5):511-5. doi: 10.1038/nbt.1621. Epub 2010 May 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验