Suppr超能文献

全长异构体定量的 RNA-Seq 比较评估。

Comparative evaluation of full-length isoform quantification from RNA-Seq.

机构信息

Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.

National Institute on Aging, National Institutes of Health, Baltimore, MD, USA.

出版信息

BMC Bioinformatics. 2021 May 25;22(1):266. doi: 10.1186/s12859-021-04198-1.

Abstract

BACKGROUND

Full-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and has been an area of active development since the beginning. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are short.

RESULTS

Here we use simulated benchmarking data that reflects many properties of real data, including polymorphisms, intron signal and non-uniform coverage, allowing for systematic comparative analyses of isoform quantification accuracy and its impact on differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a simple approach is included as a baseline control.

CONCLUSIONS

Salmon, kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform dramatically better than the simple approach. We determine the structural parameters with the greatest impact on quantification accuracy to be length and sequence compression complexity and not so much the number of isoforms. The effect of incomplete annotation on performance is also investigated. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification and isoform level DE should still be employed selectively.

摘要

背景

从 RNA-Seq 中定量全长异构体是转录组学分析的一个关键目标,自一开始就是一个活跃的研究领域。其根本的困难源于 RNA 转录本很长,而 RNA-Seq 读长很短这一事实。

结果

在这里,我们使用模拟基准数据,它反映了许多真实数据的特性,包括多态性、内含子信号和非均匀覆盖,从而能够对异构体定量准确性及其对差异表达分析的影响进行系统的比较分析。包括基于基因组、转录组和伪比对的方法;并包括一种简单的方法作为基线对照。

结论

在理想化的数据上,Salmon、kallisto、RSEM 和 Cufflinks 表现出最高的准确性,而在更现实的数据上,它们的表现并不比简单的方法好得多。我们确定对定量准确性影响最大的结构参数是长度和序列压缩复杂度,而不是异构体的数量。还研究了不完全注释对性能的影响。总的来说,测试的方法与真实情况有足够的差异,这表明全长异构体定量和异构体水平的 DE 仍然应该有选择地使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b30d/8145802/47f9e74d4b18/12859_2021_4198_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验