PennSeq：通过建模非均匀读取分布实现 RNA-Seq 中精确的异构体特异性基因表达定量。

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution.

机构信息

Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.

出版信息

Nucleic Acids Res. 2014 Feb;42(3):e20. doi: 10.1093/nar/gkt1304. Epub 2013 Dec 20.

DOI:10.1093/nar/gkt1304

PMID:24362841

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3919567/

Abstract

Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq.

摘要

正确估计基因的异构体特异性表达对于理解复杂的生物学机制和定位疾病易感基因非常重要。然而，估计基因的异构体特异性表达具有挑战性，因为 RNA-Seq（RNA 测序）数据中存在各种偏差，这使得分析变得复杂，如果不进行适当的校正，可能会影响异构体表达的估计和下游分析。在本文中，我们提出了 PennSeq，这是一种统计方法，允许每个异构体具有自己的非均匀读分布。我们不做参数假设，而是通过使用非参数方法为基础数据赋予足够的权重。我们的基本原理是，无论是什么因素导致非均匀性，无论是由于六聚体引发偏倚、局部序列偏倚、位置偏倚、RNA 降解、映射偏倚还是其他未知原因，片段从特定区域被采样的概率将反映在对齐的数据中。这种经验方法因此最大限度地反映了真实的基础非均匀读分布。我们使用具有已知真实值的模拟数据以及两个真实的 Illumina RNA-Seq 数据集（包括一个具有定量实时聚合酶链反应测量的数据集）来评估 PennSeq 的性能。我们的结果表明，PennSeq 的性能优于现有方法，特别是对于表现出严重非均匀性的异构体。PennSeq 可在 http://sourceforge.net/projects/pennseq 上免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb3e/3919567/6b834b01e2ae/gkt1304f1p.jpg

相似文献

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution.

Nucleic Acids Res. 2014 Feb;42(3):e20. doi: 10.1093/nar/gkt1304. Epub 2013 Dec 20.

Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.

Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data.

Bioinformatics. 2014 Feb 15;30(4):506-13. doi: 10.1093/bioinformatics/btt704. Epub 2013 Dec 3.

NURD: an implementation of a new method to estimate isoform expression from non-uniform RNA-seq data.

BMC Bioinformatics. 2013 Jul 10;14:220. doi: 10.1186/1471-2105-14-220.

Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates.

PLoS Comput Biol. 2017 May 15;13(5):e1005515. doi: 10.1371/journal.pcbi.1005515. eCollection 2017 May.

Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq.

Bioinformatics. 2011 Feb 15;27(4):502-8. doi: 10.1093/bioinformatics/btq696. Epub 2010 Dec 17.

TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.

Bioinformatics. 2013 Sep 15;29(18):2292-9. doi: 10.1093/bioinformatics/btt381. Epub 2013 Jul 2.

ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.

Bioinformatics. 2014 Mar 1;30(5):644-51. doi: 10.1093/bioinformatics/btt591. Epub 2013 Oct 15.

EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments.

Bioinformatics. 2013 Apr 15;29(8):1035-43. doi: 10.1093/bioinformatics/btt087. Epub 2013 Feb 21.

Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching.

BMC Bioinformatics. 2013 Dec 24;14:370. doi: 10.1186/1471-2105-14-370.

引用本文的文献

Sources of non-uniform coverage in short-read RNA-Seq data.

bioRxiv. 2025 Feb 6:2025.01.30.634337. doi: 10.1101/2025.01.30.634337.

Application of Single-Cell RNA Sequencing in Ovarian Development.

Biomolecules. 2022 Dec 27;13(1):47. doi: 10.3390/biom13010047.

LIQA: long-read isoform quantification and analysis.

Genome Biol. 2021 Jun 17;22(1):182. doi: 10.1186/s13059-021-02399-8.

Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab148.

LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing.

BMC Genomics. 2020 Dec 29;21(Suppl 11):793. doi: 10.1186/s12864-020-07207-4.

Transcriptome-wide Interrogation of the Functional Intronome by Spliceosome Profiling.

Cell. 2018 May 3;173(4):1031-1044.e13. doi: 10.1016/j.cell.2018.03.062.

PennDiff: detecting differential alternative splicing and transcription by RNA sequencing.

Bioinformatics. 2018 Jul 15;34(14):2384-2391. doi: 10.1093/bioinformatics/bty097.

Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates.

PLoS Comput Biol. 2017 May 15;13(5):e1005515. doi: 10.1371/journal.pcbi.1005515. eCollection 2017 May.

Modeling Enzyme Processivity Reveals that RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways.

Cell Syst. 2016 Nov 23;3(5):467-479.e12. doi: 10.1016/j.cels.2016.10.012. Epub 2016 Nov 10.

The exon quantification pipeline (EQP): a comprehensive approach to the quantification of gene, exon and junction expression from RNA-seq data.

Nucleic Acids Res. 2016 Sep 19;44(16):e132. doi: 10.1093/nar/gkw538. Epub 2016 Jun 14.

本文引用的文献

Transcriptome and genome sequencing uncovers functional variation in humans.

Nature. 2013 Sep 26;501(7468):506-11. doi: 10.1038/nature12531. Epub 2013 Sep 15.

GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data.

Genome Biol. 2013 Jul 22;14(7):R74. doi: 10.1186/gb-2013-14-7-r74.

Evaluating the impact of sequencing depth on transcriptome profiling in human adipose.

PLoS One. 2013 Jun 24;8(6):e66883. doi: 10.1371/journal.pone.0066883. Print 2013.

Race and gender variation in response to evoked inflammation.

J Transl Med. 2013 Mar 12;11:63. doi: 10.1186/1479-5876-11-63.

EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments.

Bioinformatics. 2013 Apr 15;29(8):1035-43. doi: 10.1093/bioinformatics/btt087. Epub 2013 Feb 21.

Differential analysis of gene regulation at transcript resolution with RNA-seq.

Nat Biotechnol. 2013 Jan;31(1):46-53. doi: 10.1038/nbt.2450. Epub 2012 Dec 9.

iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data.

Genome Res. 2013 Mar;23(3):519-29. doi: 10.1101/gr.142232.112. Epub 2012 Nov 29.

Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.

Bioinformatics. 2012 Nov 15;28(22):2914-21. doi: 10.1093/bioinformatics/bts559. Epub 2012 Oct 11.

Modelling and simulating generic RNA-Seq experiments with the flux simulator.

Nucleic Acids Res. 2012 Nov 1;40(20):10073-83. doi: 10.1093/nar/gks666. Epub 2012 Sep 7.

Modeling RNA degradation for RNA-Seq with applications.

Biostatistics. 2012 Sep;13(4):734-47. doi: 10.1093/biostatistics/kxs001. Epub 2012 Feb 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PennSeq：通过建模非均匀读取分布实现 RNA-Seq 中精确的异构体特异性基因表达定量。

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution.

机构信息

出版信息

Nucleic Acids Res. 2014 Feb;42(3):e20. doi: 10.1093/nar/gkt1304. Epub 2013 Dec 20.

DOI:10.1093/nar/gkt1304

PMID:24362841

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3919567/

Abstract

摘要

PennSeq：通过建模非均匀读取分布实现 RNA-Seq 中精确的异构体特异性基因表达定量。

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

PennSeq：通过建模非均匀读取分布实现 RNA-Seq 中精确的异构体特异性基因表达定量。

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献