通过对异构体和外显子特异性读段测序率进行建模来改进RNA测序表达估计。

Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

作者信息

Liu Xuejun, Shi Xinxin, Chen Chunlin, Zhang Li

机构信息

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Rd., Nanjing, 211106, China.

出版信息

BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.

DOI:10.1186/s12859-015-0750-6

PMID:26475308

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4609108/

Abstract

BACKGROUND

The high-throughput sequencing technology, RNA-Seq, has been widely used to quantify gene and isoform expression in the study of transcriptome in recent years. Accurate expression measurement from the millions or billions of short generated reads is obstructed by difficulties. One is ambiguous mapping of reads to reference transcriptome caused by alternative splicing. This increases the uncertainty in estimating isoform expression. The other is non-uniformity of read distribution along the reference transcriptome due to positional, sequencing, mappability and other undiscovered sources of biases. This violates the uniform assumption of read distribution for many expression calculation approaches, such as the direct RPKM calculation and Poisson-based models. Many methods have been proposed to address these difficulties. Some approaches employ latent variable models to discover the underlying pattern of read sequencing. However, most of these methods make bias correction based on surrounding sequence contents and share the bias models by all genes. They therefore cannot estimate gene- and isoform-specific biases as revealed by recent studies.

RESULTS

We propose a latent variable model, NLDMseq, to estimate gene and isoform expression. Our method adopts latent variables to model the unknown isoforms, from which reads originate, and the underlying percentage of multiple spliced variants. The isoform- and exon-specific read sequencing biases are modeled to account for the non-uniformity of read distribution, and are identified by utilizing the replicate information of multiple lanes of a single library run. We employ simulation and real data to verify the performance of our method in terms of accuracy in the calculation of gene and isoform expression. Results show that NLDMseq obtains competitive gene and isoform expression compared to popular alternatives. Finally, the proposed method is applied to the detection of differential expression (DE) to show its usefulness in the downstream analysis.

CONCLUSIONS

The proposed NLDMseq method provides an approach to accurately estimate gene and isoform expression from RNA-Seq data by modeling the isoform- and exon-specific read sequencing biases. It makes use of a latent variable model to discover the hidden pattern of read sequencing. We have shown that it works well in both simulations and real datasets, and has competitive performance compared to popular methods. The method has been implemented as a freely available software which can be found at https://github.com/PUGEA/NLDMseq.

摘要

背景

近年来，高通量测序技术RNA-Seq已广泛应用于转录组研究中的基因和异构体表达定量。从数百万或数十亿条短序列 reads 中准确测量表达受到诸多困难的阻碍。一是由于可变剪接导致 reads 与参考转录组的映射模糊，这增加了异构体表达估计的不确定性。另一个是由于位置、测序、可映射性和其他未发现的偏差来源，reads 沿参考转录组的分布不均匀。这违反了许多表达计算方法（如直接RPKM计算和基于泊松的模型）对reads分布的均匀假设。已经提出了许多方法来解决这些困难。一些方法采用潜在变量模型来发现reads测序的潜在模式。然而，这些方法大多基于周围序列内容进行偏差校正，并且所有基因共享偏差模型。因此，它们无法像最近的研究所揭示的那样估计基因和异构体特异性偏差。

结果

我们提出了一种潜在变量模型NLDMseq来估计基因和异构体表达。我们的方法采用潜在变量对 reads 来源的未知异构体以及多个剪接变体的潜在百分比进行建模。对异构体和外显子特异性的 reads 测序偏差进行建模以考虑 reads 分布的不均匀性，并通过利用单个文库运行的多个泳道的重复信息来识别。我们使用模拟和真实数据来验证我们的方法在基因和异构体表达计算准确性方面的性能。结果表明，与流行的替代方法相比，NLDMseq获得了具有竞争力的基因和异构体表达。最后，将所提出的方法应用于差异表达（DE）检测，以显示其在下游分析中的有用性。

结论

所提出的NLDMseq方法通过对异构体和外显子特异性的 reads 测序偏差进行建模，提供了一种从RNA-Seq数据中准确估计基因和异构体表达的方法。它利用潜在变量模型来发现reads测序的隐藏模式。我们已经表明，它在模拟和真实数据集中都表现良好，并且与流行方法相比具有竞争力。该方法已实现为可免费获取的软件，可在https://github.com/PUGEA/NLDMseq找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e7f/4609108/52e302830af9/12859_2015_750_Fig1_HTML.jpg

相似文献

Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.通过对异构体和外显子特异性读段测序率进行建模来改进RNA测序表达估计。

BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.

Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data.外显子特异性偏差分布建模改善了RNA测序数据的分析

PLoS One. 2015 Oct 8;10(10):e0140032. doi: 10.1371/journal.pone.0140032. eCollection 2015.

Identifying differentially spliced genes from two groups of RNA-seq samples.从两组 RNA-seq 样本中鉴定差异剪接基因。

Gene. 2013 Apr 10;518(1):164-70. doi: 10.1016/j.gene.2012.11.045. Epub 2012 Dec 8.

Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq.使用非均匀读分布模型提高 RNA-Seq 中异构体表达推断。

Bioinformatics. 2011 Feb 15;27(4):502-8. doi: 10.1093/bioinformatics/btq696. Epub 2010 Dec 17.

Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity.全长异构体拼接测序解析癌症转录组复杂性。

BMC Genomics. 2024 Jan 29;25(1):122. doi: 10.1186/s12864-024-10021-x.

A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.一种用于从多样本RNA测序数据估计异构体表达水平的结构化稀疏回归方法。

Genet Mol Res. 2016 Jun 3;15(2):gmr7670. doi: 10.4238/gmr.15027670.

Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression.基于层次分析的 RNA-seq 测序reads 提高了等位基因特异性表达的准确性。

Bioinformatics. 2018 Jul 1;34(13):2177-2184. doi: 10.1093/bioinformatics/bty078.

Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing.弗雷迪：使用长读测序进行注释独立的转录组可变剪接异构体的检测和发现。

Nucleic Acids Res. 2023 Jan 25;51(2):e11. doi: 10.1093/nar/gkac1112.

Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data.利用多样本 RNA-Seq 数据联合估计异构体表达和异构体特异性读取分布。

Bioinformatics. 2014 Feb 15;30(4):506-13. doi: 10.1093/bioinformatics/btt704. Epub 2013 Dec 3.

Transcript Profiling Using Long-Read Sequencing Technologies.使用长读长测序技术进行转录本分析

Methods Mol Biol. 2018;1783:121-147. doi: 10.1007/978-1-4939-7834-2_6.

引用本文的文献

N-Lactoyl-Phenylalanine modulates lipid metabolism in microglia/macrophage via the AMPK-PGC1α-PPARγ pathway to promote recovery in mice with spinal cord injury.N-乳酰苯丙氨酸通过AMPK-PGC1α-PPARγ途径调节小胶质细胞/巨噬细胞中的脂质代谢，以促进脊髓损伤小鼠的恢复。

J Neuroinflammation. 2025 Jun 27;22(1):167. doi: 10.1186/s12974-025-03495-3.

Neurological Emergency Treatment Strategy: A Neuron-Targeted Regulation System for Reactive Oxygen Species Metabolism through Ferroptosis Modulation.神经急症治疗策略：一种通过铁死亡调节对活性氧代谢进行神经元靶向调控的系统

ACS Nano. 2025 Mar 11;19(9):8753-8772. doi: 10.1021/acsnano.4c15705. Epub 2025 Feb 25.

Deciphering the virome of Chunkung (Cnidium officinale) showing dwarfism-like symptoms via a high-throughput sequencing analysis.通过高通量测序分析揭示表现出矮化症状的川穹（藁本）的病毒组。

Virol J. 2024 Apr 15;21(1):86. doi: 10.1186/s12985-024-02361-7.

TAGET: a toolkit for analyzing full-length transcripts from long-read sequencing.TAGET：用于分析长读测序全长转录本的工具包。

Nat Commun. 2023 Sep 23;14(1):5935. doi: 10.1038/s41467-023-41649-0.

RNA sequencing reveals the circular RNA expression profiles of the infrapatellar fat pad/synovium unit.RNA测序揭示了髌下脂肪垫/滑膜单元的环状RNA表达谱。

Ann Transl Med. 2021 Nov;9(22):1685. doi: 10.21037/atm-21-5739.

Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision.针对 (sc)RNA-seq 的反偏差训练：提高精度的实验和计算方法。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab148.

Changed cellular functions and aberrantly expressed miRNAs and circRNAs in bone marrow stem cells in osteonecrosis of the femoral head.改变了骨髓干细胞在股骨头坏死中的细胞功能和异常表达的 miRNAs 和 circRNAs。

Int J Mol Med. 2020 Mar;45(3):805-815. doi: 10.3892/ijmm.2020.4455. Epub 2020 Jan 8.

Temporal dynamics in meta longitudinal RNA-Seq data.元纵向 RNA-Seq 数据中的时间动态。

Sci Rep. 2019 Jan 24;9(1):763. doi: 10.1038/s41598-018-37397-7.

Stress-induced and epigenetic-mediated maize transcriptome regulation study by means of transcriptome reannotation and differential expression analysis.通过转录组重新注释和差异表达分析对胁迫诱导和表观遗传介导的玉米转录组调控进行研究。

Sci Rep. 2016 Jul 27;6:30446. doi: 10.1038/srep30446.

Histological and transcriptome analyses of testes from Duroc and Meishan boars.杜洛克和梅山公猪睾丸的组织学和转录组分析。

Sci Rep. 2016 Feb 11;6:20758. doi: 10.1038/srep20758.

本文引用的文献

Bioinformatics. 2014 Feb 15;30(4):506-13. doi: 10.1093/bioinformatics/btt704. Epub 2013 Dec 3.

Flexible analysis of RNA-seq data using mixed effects models.混合效应模型在 RNA-seq 数据分析中的灵活应用。

Bioinformatics. 2014 Jan 15;30(2):180-8. doi: 10.1093/bioinformatics/btt624. Epub 2013 Nov 26.

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.RNA测序数据差异基因表达分析方法的综合评估

Genome Biol. 2013;14(9):R95. doi: 10.1186/gb-2013-14-9-r95.

EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments.EBSeq：RNA-seq 实验中用于推理的经验贝叶斯层次模型。

Bioinformatics. 2013 Apr 15;29(8):1035-43. doi: 10.1093/bioinformatics/btt087. Epub 2013 Feb 21.

puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis.puma 3.0：改进了基因和转录本表达分析的不确定性传播方法。

BMC Bioinformatics. 2013 Feb 5;14:39. doi: 10.1186/1471-2105-14-39.

Differential analysis of gene regulation at transcript resolution with RNA-seq.基于 RNA-seq 的转录分辨率下基因调控的差异分析。

Nat Biotechnol. 2013 Jan;31(1):46-53. doi: 10.1038/nbt.2450. Epub 2012 Dec 9.

Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.从偏向性 RNA-Seq 读段进行转录组组装和异构体表达水平估计。

Bioinformatics. 2012 Nov 15;28(22):2914-21. doi: 10.1093/bioinformatics/bts559. Epub 2012 Oct 11.

Identifying differentially expressed transcripts from RNA-seq data with biological variation.从具有生物学变异的 RNA-seq 数据中鉴定差异表达的转录本。

Bioinformatics. 2012 Jul 1;28(13):1721-8. doi: 10.1093/bioinformatics/bts260. Epub 2012 May 3.

Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

A new approach to bias correction in RNA-Seq.一种 RNA-Seq 中偏倚校正的新方法。

Bioinformatics. 2012 Apr 1;28(7):921-8. doi: 10.1093/bioinformatics/bts055. Epub 2012 Jan 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过对异构体和外显子特异性读段测序率进行建模来改进RNA测序表达估计。

Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献