异构体丰度推断能更准确地估计RNA测序中的基因表达水平。

Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq.

作者信息

Wang Xi, Wu Zhengpeng, Zhang Xuegong

机构信息

MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, P R China.

出版信息

J Bioinform Comput Biol. 2010 Dec;8 Suppl 1:177-92. doi: 10.1142/s0219720010005178.

DOI:10.1142/s0219720010005178

PMID:21155027

Abstract

Due to its unprecedented high-resolution and detailed information, RNA-seq technology based on next-generation high-throughput sequencing significantly boosts the ability to study transcriptomes. The estimation of genes' transcript abundance levels or gene expression levels has always been an important question in research on the transcriptional regulation and gene functions. On the basis of the concept of Reads Per Kilo-base per Million reads (RPKM), taking the union-intersection genes (UI-based) and summing up inferred isoform abundance (isoform-based) are the two current strategies to estimate gene expression levels, but produce different estimations. In this paper, we made the first attempt to compare the two strategies' performances through a series of simulation studies. Our results showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy. If taking into account the non-uniformity of read distribution, the isoform-based method can further reduce estimation errors. We applied both strategies to real RNA-seq datasets of technical replicates, and found that the isoform-based strategy also displays a better performance. For a more accurate estimation of gene expression levels from RNA-seq data, even if the abundance levels of isoforms are not of interest, it is still better to first infer the isoform abundance and sum them up to get the expression level of a gene as a whole.

摘要

由于基于新一代高通量测序的RNA测序（RNA-seq）技术具有前所未有的高分辨率和详细信息，它显著提高了研究转录组的能力。基因转录本丰度水平或基因表达水平的估计一直是转录调控和基因功能研究中的一个重要问题。基于每百万读段中每千碱基读段数（RPKM）的概念，采用并集交集基因（基于UI）和汇总推断的异构体丰度（基于异构体）是目前估计基因表达水平的两种策略，但会产生不同的估计结果。在本文中，我们首次尝试通过一系列模拟研究来比较这两种策略的性能。我们的结果表明，基于异构体的方法不仅给出了更准确的估计，而且比基于UI的策略具有更小的不确定性。如果考虑读段分布的不均匀性，基于异构体的方法可以进一步减少估计误差。我们将这两种策略应用于技术重复的真实RNA-seq数据集，发现基于异构体的策略也表现出更好的性能。为了从RNA-seq数据中更准确地估计基因表达水平，即使异构体的丰度水平不是我们感兴趣的，最好还是先推断异构体的丰度并将它们汇总起来，以得到一个基因整体的表达水平。

相似文献

Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq.异构体丰度推断能更准确地估计RNA测序中的基因表达水平。

J Bioinform Comput Biol. 2010 Dec;8 Suppl 1:177-92. doi: 10.1142/s0219720010005178.

Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq.使用非均匀读分布模型提高 RNA-Seq 中异构体表达推断。

Bioinformatics. 2011 Feb 15;27(4):502-8. doi: 10.1093/bioinformatics/btq696. Epub 2010 Dec 17.

Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.通过对异构体和外显子特异性读段测序率进行建模来改进RNA测序表达估计。

BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.

A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.一种用于从多样本RNA测序数据估计异构体表达水平的结构化稀疏回归方法。

Genet Mol Res. 2016 Jun 3;15(2):gmr7670. doi: 10.4238/gmr.15027670.

Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data.利用多样本 RNA-Seq 数据联合估计异构体表达和异构体特异性读取分布。

Bioinformatics. 2014 Feb 15;30(4):506-13. doi: 10.1093/bioinformatics/btt704. Epub 2013 Dec 3.

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation.基于下一代 mRNA 测序（RNA-Seq）数据的稀疏线性建模用于发现异构体和丰度估计。

Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):19867-72. doi: 10.1073/pnas.1113972108. Epub 2011 Dec 1.

Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching.定量 RNA-seq 的盲点：评估丰度、差异表达和异构体转换的限制。

BMC Bioinformatics. 2013 Dec 24;14:370. doi: 10.1186/1471-2105-14-370.

Accurate inference of isoforms from multiple sample RNA-Seq data.从多个样本RNA测序数据中准确推断异构体

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S15. doi: 10.1186/1471-2164-16-S2-S15. Epub 2015 Jan 21.

Statistical inferences for isoform expression in RNA-Seq.RNA测序中异构体表达的统计推断。

Bioinformatics. 2009 Apr 15;25(8):1026-32. doi: 10.1093/bioinformatics/btp113. Epub 2009 Feb 25.

TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.TIGAR：一种通过变分贝叶斯推断进行 RNA-Seq 数据缺口对齐的转录本丰度估计方法。

Bioinformatics. 2013 Sep 15;29(18):2292-9. doi: 10.1093/bioinformatics/btt381. Epub 2013 Jul 2.

引用本文的文献

Multi-Organ Transcriptome Response of Lumpfish () to Subspecies Systemic Infection.圆鳍鱼对亚种全身感染的多器官转录组反应

Microorganisms. 2022 Oct 26;10(11):2113. doi: 10.3390/microorganisms10112113.

Whole blood transcriptomic analysis of beef cattle at arrival identifies potential predictive molecules and mechanisms that indicate animals that naturally resist bovine respiratory disease.牛到达时的全血转录组分析鉴定出潜在的预测分子和机制，这些分子和机制表明动物自然抵抗牛呼吸道疾病。

PLoS One. 2020 Jan 13;15(1):e0227507. doi: 10.1371/journal.pone.0227507. eCollection 2020.

Modeling and analysis of RNA-seq data: a review from a statistical perspective.RNA测序数据的建模与分析：基于统计学视角的综述

Quant Biol. 2018 Sep;6(3):195-209. doi: 10.1007/s40484-018-0144-7. Epub 2018 Aug 10.

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs.一种用于量化转录丰度估计和注释目录可靠性的连接覆盖兼容性得分。

Life Sci Alliance. 2019 Jan 17;2(1). doi: 10.26508/lsa.201800175. Print 2019 Feb.

Transcriptome analysis in different rice cultivars provides novel insights into desiccation and salinity stress responses.不同水稻品种的转录组分析为干旱和盐胁迫响应提供了新的见解。

Sci Rep. 2016 Mar 31;6:23719. doi: 10.1038/srep23719.

Complementary Post Transcriptional Regulatory Information is Detected by PUNCH-P and Ribosome Profiling.通过PUNCH-P和核糖体分析检测互补的转录后调控信息。

Sci Rep. 2016 Feb 22;6:21635. doi: 10.1038/srep21635.

Advanced Applications of RNA Sequencing and Challenges.RNA测序的高级应用与挑战

Bioinform Biol Insights. 2015 Nov 15;9(Suppl 1):29-46. doi: 10.4137/BBI.S28991. eCollection 2015.

Differential analysis of gene regulation at transcript resolution with RNA-seq.基于 RNA-seq 的转录分辨率下基因调控的差异分析。

Nat Biotechnol. 2013 Jan;31(1):46-53. doi: 10.1038/nbt.2450. Epub 2012 Dec 9.

Computational analysis of noncoding RNAs.非编码 RNA 的计算分析。

Wiley Interdiscip Rev RNA. 2012 Nov-Dec;3(6):759-78. doi: 10.1002/wrna.1134. Epub 2012 Sep 18.

Next generation quantitative genetics in plants.植物下一代数量遗传学。

Front Plant Sci. 2011 Nov 15;2:77. doi: 10.3389/fpls.2011.00077. eCollection 2011.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

异构体丰度推断能更准确地估计RNA测序中的基因表达水平。

Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献