Suppr超能文献

RNA-Seq 比对算法与 RNA-Seq 统一映射器(RUM)的比较分析。

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).

机构信息

Penn Center for Bioinformatics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.

出版信息

Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19.

Abstract

MOTIVATION

A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously.

RESULTS

We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription-polymerase chain reaction (RT-PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability.

AVAILABILITY

The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (http://cbil.upenn.edu/RUM).

CONTACT

ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu

SUPPLEMENTARY INFORMATION

The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.

摘要

动机

高通量测序中的一个关键任务是将数百万个短读段与参考基因组对齐。由于 RNA 剪接,RNA 测序 (RNA-Seq) 的对齐尤其复杂。有许多 RNA-Seq 算法可用,并且声称在检测剪接接头的同时具有高精度和高效率的读对齐。RNA-Seq 数据本质上是离散的;因此,在具有合理的基因模型和比较指标的情况下,可以对 RNA-Seq 数据进行模拟,以达到足够的准确性,从而能够对对齐算法进行有意义的基准测试。以前没有进行过严格比较所有可行的已发表 RNA-Seq 算法的工作。

结果

我们开发了一种 RNA-Seq 模拟器,该模拟器可模拟 RNA 对齐的主要障碍,包括可变剪接、插入、缺失、替换、测序错误和内含子信号。我们使用此模拟器来衡量可用算法在碱基和接头级别上的准确性和鲁棒性。此外,我们使用逆转录-聚合酶链反应 (RT-PCR) 和 Sanger 测序来验证算法在检测新型转录物特征(如小鼠视网膜 RNA-Seq 数据中的新型外显子和可变剪接)方面的能力。开发了一个基于 BLAT 的管道来探索针对此问题的现有工具的性能,并将其与最近开发的方法进行比较。这个名为 RNA-Seq 统一映射器 (RUM) 的管道与当前最好的对齐器表现相当,并提供了准确性、速度和可用性的优势组合。

可用性

RUM 管道通过 Amazon Cloud 分发,并可在使用 Sun Grid Engine(http://cbil.upenn.edu/RUM)的计算集群上使用。

联系方式

ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu

补充信息

文章中描述的 RNA-Seq 序列读取已存储在 GEO 中,访问号为 GSE26248。

相似文献

2
TopHat: discovering splice junctions with RNA-Seq.TopHat:利用RNA测序发现剪接接头
Bioinformatics. 2009 May 1;25(9):1105-11. doi: 10.1093/bioinformatics/btp120. Epub 2009 Mar 16.
4
JAGuaR: junction alignments to genome for RNA-seq reads.JAGuaR:用于RNA测序读数与基因组的接头比对。
PLoS One. 2014 Jul 25;9(7):e102398. doi: 10.1371/journal.pone.0102398. eCollection 2014.
5
STAR: ultrafast universal RNA-seq aligner.STAR:超快通用 RNA-seq 对齐工具。
Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.
6
Systematic evaluation of spliced alignment programs for RNA-seq data.系统评估 RNA-seq 数据拼接比对程序。
Nat Methods. 2013 Dec;10(12):1185-91. doi: 10.1038/nmeth.2722. Epub 2013 Nov 3.
8
Simulation-based comprehensive benchmarking of RNA-seq aligners.基于模拟的RNA测序比对工具综合基准测试
Nat Methods. 2017 Feb;14(2):135-139. doi: 10.1038/nmeth.4106. Epub 2016 Dec 12.
10
Supersplat--spliced RNA-seq alignment.超拼接--拼接 RNA-seq 比对。
Bioinformatics. 2010 Jun 15;26(12):1500-5. doi: 10.1093/bioinformatics/btq206. Epub 2010 Apr 21.

引用本文的文献

4
PxBLAT: an efficient python binding library for BLAT.PxBLAT:BLAT 的高效 Python 绑定库。
BMC Bioinformatics. 2024 Jun 19;25(1):219. doi: 10.1186/s12859-024-05844-0.
7
In vivo CRISPR screening directly targeting testicular cells.在体 CRISPR 筛选直接靶向睾丸细胞。
Cell Genom. 2024 Mar 13;4(3):100510. doi: 10.1016/j.xgen.2024.100510. Epub 2024 Mar 5.

本文引用的文献

7
Next generation sequencing in functional genomics.功能基因组学中的下一代测序。
Brief Bioinform. 2010 Sep;11(5):499-511. doi: 10.1093/bib/bbq018. Epub 2010 May 25.
9
Towards reliable isoform quantification using RNA-SEQ data.使用 RNA-SEQ 数据进行可靠的异构体定量。
BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-11-S3-S6.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验