从RNA测序数据估计可变剪接异构体频率。

Estimation of alternative splicing isoform frequencies from RNA-Seq data.

作者信息

Nicolae Marius, Mangul Serghei, Măndoiu Ion I, Zelikovsky Alex

机构信息

Department of Computer Science & Engineering, University of Connecticut,371 Fairfield Rd,, Unit 2155, Storrs, CT 06269-2155, USA.

出版信息

Algorithms Mol Biol. 2011 Apr 19;6(1):9. doi: 10.1186/1748-7188-6-9.

DOI:10.1186/1748-7188-6-9

PMID:21504602

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3107792/

Abstract

BACKGROUND

Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.

RESULTS

In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/.

CONCLUSIONS

Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.

摘要

背景

大规模平行全转录组测序，通常称为RNA测序（RNA-Seq），正迅速成为基因表达谱分析的首选技术。然而，由于当前测序技术产生的读长较短，对可变剪接基因异构体的表达水平进行估计仍然具有挑战性。

结果

在本文中，我们提出了一种新的期望最大化算法，用于从RNA-Seq数据推断异构体和基因特异性表达水平。我们的算法称为IsoEM，基于对测序文库制备过程中产生的插入片段大小分布所提供的歧义信息进行解析，并在可用时利用碱基质量得分、链和读段配对信息。IsoEM的开源Java实现可从http://dna.engr.uconn.edu/software/IsoEM/免费获取。

结论

对合成和真实RNA-Seq数据集进行的实证实验表明，IsoEM具有可扩展的运行时间，并且在异构体和基因表达水平估计方面优于现有方法。模拟实验证实了先前的发现，即在固定测序成本下，使用长度超过25 - 36个碱基的读段不一定能提高注释异构体和基因表达水平估计的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/130d/3107792/c0c6f30f8ea7/1748-7188-6-9-1.jpg

相似文献

Estimation of alternative splicing isoform frequencies from RNA-Seq data.从RNA测序数据估计可变剪接异构体频率。

Algorithms Mol Biol. 2011 Apr 19;6(1):9. doi: 10.1186/1748-7188-6-9.

Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.通过对异构体和外显子特异性读段测序率进行建模来改进RNA测序表达估计。

BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.

Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing.弗雷迪：使用长读测序进行注释独立的转录组可变剪接异构体的检测和发现。

Nucleic Acids Res. 2023 Jan 25;51(2):e11. doi: 10.1093/nar/gkac1112.

Transcriptome assembly and quantification from Ion Torrent RNA-Seq data.基于Ion Torrent RNA测序数据的转录组组装与定量分析

BMC Genomics. 2014;15 Suppl 5(Suppl 5):S7. doi: 10.1186/1471-2164-15-S5-S7. Epub 2014 Jul 14.

LIQA: long-read isoform quantification and analysis.LIQA：长读 isoform 定量分析。

Genome Biol. 2021 Jun 17;22(1):182. doi: 10.1186/s13059-021-02399-8.

NURD: an implementation of a new method to estimate isoform expression from non-uniform RNA-seq data.NURD：一种从非均匀 RNA-seq 数据估计异构体表达的新方法的实现。

BMC Bioinformatics. 2013 Jul 10;14:220. doi: 10.1186/1471-2105-14-220.

Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data.从全转录组测序数据中准确检测和基因分型表达变体。

BMC Genomics. 2012 Apr 12;13 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2164-13-S2-S6.

Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity.全长异构体拼接测序解析癌症转录组复杂性。

BMC Genomics. 2024 Jan 29;25(1):122. doi: 10.1186/s12864-024-10021-x.

Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.从偏向性 RNA-Seq 读段进行转录组组装和异构体表达水平估计。

Bioinformatics. 2012 Nov 15;28(22):2914-21. doi: 10.1093/bioinformatics/bts559. Epub 2012 Oct 11.

IFDlong: an isoform and fusion detector for accurate annotation and quantification of long-read RNA-seq data.IFDlong：一种用于长读长RNA测序数据精确注释和定量的异构体及融合检测工具。

bioRxiv. 2024 May 14:2024.05.11.593690. doi: 10.1101/2024.05.11.593690.

引用本文的文献

Oarfish: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.皇带鱼：增强的概率模型可提高长读长转录组定量的准确性。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i304-i313. doi: 10.1093/bioinformatics/btaf240.

Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.皇带鱼：增强的概率模型可提高长读长转录组定量的准确性。

bioRxiv. 2024 Mar 1:2024.02.28.582591. doi: 10.1101/2024.02.28.582591.

RNA-seq data science: From raw data to effective interpretation.RNA测序数据科学：从原始数据到有效解读

Front Genet. 2023 Mar 13;14:997383. doi: 10.3389/fgene.2023.997383. eCollection 2023.

Comparison of Metagenomics and Metatranscriptomics Tools: A Guide to Making the Right Choice.比较宏基因组学和宏转录组学工具：做出正确选择的指南。

Genes (Basel). 2022 Dec 3;13(12):2280. doi: 10.3390/genes13122280.

Lineage abundance estimation for SARS-CoV-2 in wastewater using transcriptome quantification techniques.利用转录组定量技术估算废水中 SARS-CoV-2 的谱系丰度。

Genome Biol. 2022 Nov 8;23(1):236. doi: 10.1186/s13059-022-02805-9.

T Cell Epitope Prediction and Its Application to Immunotherapy.T 细胞表位预测及其在免疫治疗中的应用。

Front Immunol. 2021 Sep 15;12:712488. doi: 10.3389/fimmu.2021.712488. eCollection 2021.

Technology dictates algorithms: recent developments in read alignment.技术决定算法：读段比对的最新进展。

Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.

Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction.通过有效降低噪声，实现下一代测序中少数病毒单倍型的精确组装。

Nucleic Acids Res. 2021 Sep 27;49(17):e102. doi: 10.1093/nar/gkab576.

LIQA: long-read isoform quantification and analysis.LIQA：长读 isoform 定量分析。

Genome Biol. 2021 Jun 17;22(1):182. doi: 10.1186/s13059-021-02399-8.

Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures.RNA测序计算工作流程的析因研究将偏差识别为技术基因特征。

NAR Genom Bioinform. 2020 Jun 29;2(2):lqaa043. doi: 10.1093/nargab/lqaa043. eCollection 2020 Jun.

本文引用的文献

Improving RNA-Seq expression estimates by correcting for fragment bias.通过纠正片段偏倚来提高 RNA-Seq 表达估计。

Genome Biol. 2011;12(3):R22. doi: 10.1186/gb-2011-12-3-r22. Epub 2011 Mar 16.

Accurate estimation of expression levels of homologous genes in RNA-seq experiments.RNA测序实验中同源基因表达水平的准确估计。

J Comput Biol. 2011 Mar;18(3):459-68. doi: 10.1089/cmb.2010.0259.

Inference of isoforms from short sequence reads.从短序列读取中推断异构体

J Comput Biol. 2011 Mar;18(3):305-21. doi: 10.1089/cmb.2010.0243.

Alternative expression analysis by RNA sequencing.RNA 测序的替代表达分析。

Nat Methods. 2010 Oct;7(10):843-7. doi: 10.1038/nmeth.1503. Epub 2010 Sep 12.

Transcribed dark matter: meaning or myth?转录暗物质：意义还是神话？

Hum Mol Genet. 2010 Oct 15;19(R2):R162-8. doi: 10.1093/hmg/ddq362. Epub 2010 Aug 25.

Optimization of de novo transcriptome assembly from next-generation sequencing data.从头转录组组装的优化。

Genome Res. 2010 Oct;20(10):1432-40. doi: 10.1101/gr.103846.109. Epub 2010 Aug 6.

Towards reliable isoform quantification using RNA-SEQ data.使用 RNA-SEQ 数据进行可靠的异构体定量。

BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-11-S3-S6.

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.通过 RNA-Seq 进行转录本组装和定量分析揭示了细胞分化过程中未注释的转录本和异构体转换。

Nat Biotechnol. 2010 May;28(5):511-5. doi: 10.1038/nbt.1621. Epub 2010 May 2.

Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs.从头构建小鼠细胞类型特异性转录组揭示了 lincRNAs 的保守多外显子结构。

Nat Biotechnol. 2010 May;28(5):503-10. doi: 10.1038/nbt.1633. Epub 2010 May 2.

Biases in Illumina transcriptome sequencing caused by random hexamer priming.Illumina 转录组测序中随机六聚体引物引起的偏倚。

Nucleic Acids Res. 2010 Jul;38(12):e131. doi: 10.1093/nar/gkq224. Epub 2010 Apr 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从RNA测序数据估计可变剪接异构体频率。

Estimation of alternative splicing isoform frequencies from RNA-Seq data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献