Suppr超能文献

一种利用RNA测序数据进行转录本定量的可靠方法。

A robust method for transcript quantification with RNA-seq data.

作者信息

Huang Yan, Hu Yin, Jones Corbin D, MacLeod James N, Chiang Derek Y, Liu Yufeng, Prins Jan F, Liu Jinze

机构信息

Department of Computer Science, University of Kentucky , Lexington, KY 40506, USA.

出版信息

J Comput Biol. 2013 Mar;20(3):167-87. doi: 10.1089/cmb.2012.0230.

Abstract

The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g., healthy vs. diseased cells) but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e., lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this article, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO, our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy and the inference of dominant set of transcripts than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.

摘要

高通量RNA测序技术的出现使得对转录组进行深度采样成为可能,从而能够对转录本异构体的多样性和丰度进行表征。异构体的准确丰度估计或转录本定量对于下游差异分析(例如,健康细胞与患病细胞)至关重要,但由于多种原因,这仍然是一个具有挑战性的问题。首先,虽然已经开发了各种类型的算法用于丰度估计,但短读段往往不能唯一地识别它们所采样的转录本异构体。因此,定量问题可能无法识别,即即使读段唯一地映射到参考基因组,也缺乏唯一的转录本解决方案。在本文中,我们开发了一种用于转录本定量的通用线性模型,该模型利用跨越多个剪接位点的读段来改善可识别性。其次,从转录组中采样的RNA测序读段表现出未知的位置特异性和序列特异性偏差。我们扩展了我们的方法,以便在转录本定量过程中同时学习偏差参数,以提高准确性。第三,转录本定量通常会提供一组候选异构体,并非所有这些异构体都可能在给定的组织类型或条件下显著表达。通过使用套索回归解决线性系统,我们的方法可以推断出一组准确的主要表达转录本,而现有方法往往会将正表达分配给每个候选异构体。使用模拟的RNA测序数据集,我们的方法比现有方法展示出了更好的定量准确性和对主要转录本组的推断。我们的方法在实际数据上的应用通过实验证明了转录本定量对于转录组差异分析是有效的。

相似文献

1
A robust method for transcript quantification with RNA-seq data.
J Comput Biol. 2013 Mar;20(3):167-87. doi: 10.1089/cmb.2012.0230.
2
Piecing the puzzle together: a revisit to transcript reconstruction problem in RNA-seq.
BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S3. doi: 10.1186/1471-2105-15-S9-S3. Epub 2014 Sep 10.
3
Accurate inference of isoforms from multiple sample RNA-Seq data.
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S15. doi: 10.1186/1471-2164-16-S2-S15. Epub 2015 Jan 21.
4
TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads.
BMC Genomics. 2014;15 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2164-15-S10-S5. Epub 2014 Dec 12.
5
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.
BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.
6
Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation.
Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):19867-72. doi: 10.1073/pnas.1113972108. Epub 2011 Dec 1.
7
Ryūtō: network-flow based transcriptome reconstruction.
BMC Bioinformatics. 2019 Apr 16;20(1):190. doi: 10.1186/s12859-019-2786-5.
8
Union Exon Based Approach for RNA-Seq Gene Quantification: To Be or Not to Be?
PLoS One. 2015 Nov 11;10(11):e0141910. doi: 10.1371/journal.pone.0141910. eCollection 2015.
9
Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data.
G3 (Bethesda). 2018 Aug 30;8(9):2923-2940. doi: 10.1534/g3.118.200373.

引用本文的文献

2
Platform-integrated mRNA isoform quantification.
Bioinformatics. 2020 Apr 15;36(8):2466-2473. doi: 10.1093/bioinformatics/btz932.
3
Ryūtō: network-flow based transcriptome reconstruction.
BMC Bioinformatics. 2019 Apr 16;20(1):190. doi: 10.1186/s12859-019-2786-5.
4
Modeling Enzyme Processivity Reveals that RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways.
Cell Syst. 2016 Nov 23;3(5):467-479.e12. doi: 10.1016/j.cels.2016.10.012. Epub 2016 Nov 10.
5
Identification of Novel Reference Genes Suitable for qRT-PCR Normalization with Respect to the Zebrafish Developmental Stage.
PLoS One. 2016 Feb 18;11(2):e0149277. doi: 10.1371/journal.pone.0149277. eCollection 2016.
6
Network-Based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis.
PLoS Comput Biol. 2015 Dec 23;11(12):e1004465. doi: 10.1371/journal.pcbi.1004465. eCollection 2015 Dec.
7
WemIQ: an accurate and robust isoform quantification method for RNA-seq data.
Bioinformatics. 2015 Mar 15;31(6):878-85. doi: 10.1093/bioinformatics/btu757. Epub 2014 Nov 17.
8
Piecing the puzzle together: a revisit to transcript reconstruction problem in RNA-seq.
BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S3. doi: 10.1186/1471-2105-15-S9-S3. Epub 2014 Sep 10.
9
Efficient RNA isoform identification and quantification from RNA-Seq data with network flows.
Bioinformatics. 2014 Sep 1;30(17):2447-55. doi: 10.1093/bioinformatics/btu317. Epub 2014 May 9.
10
Gene and isoform expression signatures associated with tumor stage in kidney renal clear cell carcinoma.
BMC Syst Biol. 2013;7 Suppl 5(Suppl 5):S7. doi: 10.1186/1752-0509-7-S5-S7. Epub 2013 Dec 9.

本文引用的文献

1
Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation.
Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):19867-72. doi: 10.1073/pnas.1113972108. Epub 2011 Dec 1.
2
Ensembl 2012.
Nucleic Acids Res. 2012 Jan;40(Database issue):D84-90. doi: 10.1093/nar/gkr991. Epub 2011 Nov 15.
3
IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly.
J Comput Biol. 2011 Nov;18(11):1693-707. doi: 10.1089/cmb.2011.0171. Epub 2011 Sep 27.
4
SpliceTrap: a method to quantify alternative splicing under single cellular conditions.
Bioinformatics. 2011 Nov 1;27(21):3010-6. doi: 10.1093/bioinformatics/btr508. Epub 2011 Sep 6.
5
FDM: a graph-based statistical method to detect differential transcription using RNA-seq data.
Bioinformatics. 2011 Oct 1;27(19):2633-40. doi: 10.1093/bioinformatics/btr458. Epub 2011 Aug 8.
6
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.
BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.
7
Estimation of alternative splicing isoform frequencies from RNA-Seq data.
Algorithms Mol Biol. 2011 Apr 19;6(1):9. doi: 10.1186/1748-7188-6-9.
8
Improving RNA-Seq expression estimates by correcting for fragment bias.
Genome Biol. 2011;12(3):R22. doi: 10.1186/gb-2011-12-3-r22. Epub 2011 Mar 16.
9
Inference of isoforms from short sequence reads.
J Comput Biol. 2011 Mar;18(3):305-21. doi: 10.1089/cmb.2010.0243.
10
Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads.
Genome Biol. 2011;12(2):R13. doi: 10.1186/gb-2011-12-2-r13. Epub 2011 Feb 10.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验