用于异构体检测和估计的计算方法：好消息和坏消息。

Computational approaches for isoform detection and estimation: good and bad news.

机构信息

Istituto per le Applicazioni del Calcolo, CNR, Naples, Italy.

出版信息

BMC Bioinformatics. 2014 May 9;15:135. doi: 10.1186/1471-2105-15-135.

DOI:10.1186/1471-2105-15-135

PMID:24885830

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4098781/

Abstract

BACKGROUND

The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue--at a particular stage and condition--to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quantification of transcriptome at unprecedented level of resolution, accuracy and low cost. Several computational methods have been proposed to achieve such purposes. However, it is still not clear which promises are already met and which challenges are still open and require further methodological developments.

RESULTS

We carried out a simulation study to assess the performance of 5 widely used tools, such as: CEM, Cufflinks, iReckon, RSEM, and SLIDE. All of them have been used with default parameters. In particular, we considered the effect of the following three different scenarios: the availability of complete annotation, incomplete annotation, and no annotation at all. Moreover, comparisons were carried out using the methods in three different modes of action. In the first mode, the methods were forced to only deal with those isoforms that are present in the annotation; in the second mode, they were allowed to detect novel isoforms using the annotation as guide; in the third mode, they were operating in fully data driven way (although with the support of the alignment on the reference genome). In the latter modality, precision and recall are quite poor. On the contrary, results are better with the support of the annotation, even though it is not complete. Finally, abundance estimation error often shows a very skewed distribution. The performance strongly depends on the true real abundance of the isoforms. Lowly (and sometimes also moderately) expressed isoforms are poorly detected and estimated. In particular, lowly expressed isoforms are identified mainly if they are provided in the original annotation as potential isoforms.

CONCLUSIONS

Both detection and quantification of all isoforms from RNA-seq data are still hard problems and they are affected by many factors. Overall, the performance significantly changes since it depends on the modes of action and on the type of available annotation. Results obtained using complete or partial annotation are able to detect most of the expressed isoforms, even though the number of false positives is often high. Fully data driven approaches require more attention, at least for complex eucaryotic genomes. Improvements are desirable especially for isoform quantification and for isoform detection with low abundance.

摘要

背景

全转录组分析的主要目标是正确识别特定细胞/组织中特定阶段和条件下所有表达的转录本，确定它们的结构并测量它们的丰度。RNA-seq 数据有望以空前的分辨率、准确性和低成本实现转录组的鉴定和定量。已经提出了几种计算方法来实现这些目的。然而，尚不清楚哪些承诺已经得到满足，哪些挑战仍然存在，需要进一步的方法发展。

结果

我们进行了一项模拟研究，以评估 5 种广泛使用的工具的性能，例如：CEM、Cufflinks、iReckon、RSEM 和 SLIDE。它们都使用默认参数进行了测试。特别是，我们考虑了以下三种不同情况的影响：完全注释、不完全注释和完全没有注释。此外，还使用三种不同作用模式的方法进行了比较。在第一种模式下，方法被迫仅处理注释中存在的那些异构体；在第二种模式下，它们被允许使用注释作为指导来检测新的异构体；在第三种模式下，它们以完全数据驱动的方式（尽管得到了参考基因组上对齐的支持）运行。在后一种模式下，精度和召回率都很差。相反，在注释的支持下，即使不完整，结果也会更好。最后，丰度估计误差通常呈非常偏态分布。性能强烈依赖于异构体的真实真实丰度。低表达（有时也中度表达）的异构体检测和估计效果较差。特别是，如果低表达的异构体作为潜在异构体在原始注释中提供，则主要识别它们。

结论

从 RNA-seq 数据中检测和定量所有异构体仍然是一个难题，并且受到许多因素的影响。总体而言，性能变化很大，因为它取决于作用模式和可用注释的类型。使用完整或部分注释可以检测到大多数表达的异构体，尽管假阳性数量通常很高。完全数据驱动的方法需要更多关注，至少对于复杂的真核基因组而言。需要改进，特别是在异构体定量和低丰度异构体检测方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68be/4098781/8e6a1d4a9231/1471-2105-15-135-1.jpg

相似文献

Computational approaches for isoform detection and estimation: good and bad news.

BMC Bioinformatics. 2014 May 9;15:135. doi: 10.1186/1471-2105-15-135.

Updating RNA-Seq analyses after re-annotation.

Bioinformatics. 2013 Jul 1;29(13):1631-7. doi: 10.1093/bioinformatics/btt197. Epub 2013 May 14.

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.

AIDE: annotation-assisted isoform discovery with high precision.

Genome Res. 2019 Dec;29(12):2056-2072. doi: 10.1101/gr.251108.119. Epub 2019 Nov 6.

iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data.

Genome Res. 2013 Mar;23(3):519-29. doi: 10.1101/gr.142232.112. Epub 2012 Nov 29.

NSMAP: a method for spliced isoforms identification and quantification from RNA-Seq.

BMC Bioinformatics. 2011 May 16;12:162. doi: 10.1186/1471-2105-12-162.

ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.

Bioinformatics. 2014 Mar 1;30(5):644-51. doi: 10.1093/bioinformatics/btt591. Epub 2013 Oct 15.

TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.

Bioinformatics. 2013 Sep 15;29(18):2292-9. doi: 10.1093/bioinformatics/btt381. Epub 2013 Jul 2.

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.

Nat Biotechnol. 2014 May;32(5):462-4. doi: 10.1038/nbt.2862. Epub 2014 Apr 20.

Efficient RNA isoform identification and quantification from RNA-Seq data with network flows.

Bioinformatics. 2014 Sep 1;30(17):2447-55. doi: 10.1093/bioinformatics/btu317. Epub 2014 May 9.

引用本文的文献

Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data.

Nat Commun. 2024 May 10;15(1):3972. doi: 10.1038/s41467-024-48117-3.

Alternative splicing diversifies the transcriptome and proteome of the rice blast fungus during host infection.

RNA Biol. 2022;19(1):373-385. doi: 10.1080/15476286.2022.2043040. Epub 2021 Dec 31.

Comparative evaluation of full-length isoform quantification from RNA-Seq.

BMC Bioinformatics. 2021 May 25;22(1):266. doi: 10.1186/s12859-021-04198-1.

Challenges in detecting and quantifying intron retention from next generation sequencing data.

Comput Struct Biotechnol J. 2020 Feb 26;18:501-508. doi: 10.1016/j.csbj.2020.02.010. eCollection 2020.

Temporal dynamics in meta longitudinal RNA-Seq data.

Sci Rep. 2019 Jan 24;9(1):763. doi: 10.1038/s41598-018-37397-7.

Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data.

G3 (Bethesda). 2018 Aug 30;8(9):2923-2940. doi: 10.1534/g3.118.200373.

Bayesian nonparametric discovery of isoforms and individual specific quantification.

Nat Commun. 2018 Apr 27;9(1):1681. doi: 10.1038/s41467-018-03402-w.

Assisted transcriptome reconstruction and splicing orthology.

BMC Genomics. 2016 Nov 11;17(Suppl 10):786. doi: 10.1186/s12864-016-3103-6.

Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations.

Sci Rep. 2016 Aug 24;6:31602. doi: 10.1038/srep31602.

Pervasive isoform-specific translational regulation via alternative transcription start sites in mammals.

Mol Syst Biol. 2016 Jul 18;12(7):875. doi: 10.15252/msb.20166941.

本文引用的文献

Methods to study splicing from high-throughput RNA sequencing data.

Methods Mol Biol. 2014;1126:357-97. doi: 10.1007/978-1-62703-980-2_26.

Statistical Modeling of RNA-Seq Data.

Stat Sci. 2011 Feb;26(1). doi: 10.1214/10-STS343.

Assessment of transcript reconstruction methods for RNA-seq.

Nat Methods. 2013 Dec;10(12):1177-84. doi: 10.1038/nmeth.2714. Epub 2013 Nov 3.

Systematic evaluation of spliced alignment programs for RNA-seq data.

Nat Methods. 2013 Dec;10(12):1185-91. doi: 10.1038/nmeth.2722. Epub 2013 Nov 3.

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.

Genome Biol. 2013;14(9):R95. doi: 10.1186/gb-2013-14-9-r95.

MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

Bioinformatics. 2013 Oct 15;29(20):2529-38. doi: 10.1093/bioinformatics/btt442. Epub 2013 Aug 25.

Simultaneous isoform discovery and quantification from RNA-seq.

Stat Biosci. 2013 May 1;5(1):100-118. doi: 10.1007/s12561-012-9069-2.

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Nat Protoc. 2013 Aug;8(8):1494-512. doi: 10.1038/nprot.2013.084. Epub 2013 Jul 11.

Benchmarking short sequence mapping tools.

BMC Bioinformatics. 2013 Jun 7;14:184. doi: 10.1186/1471-2105-14-184.

A Hierarchical Bayesian Model for Estimating and Inferring Differential Isoform Expression for Multi-Sample RNA-Seq Data.

Stat Biosci. 2013 May 1;5(1):119-137. doi: 10.1007/s12561-011-9052-3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于异构体检测和估计的计算方法：好消息和坏消息。

Computational approaches for isoform detection and estimation: good and bad news.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献