Suppr超能文献

不同流程版本中的工具组合如何影响RNA测序分析的结果。

How tool combinations in different pipeline versions affect the outcome in RNA-seq analysis.

作者信息

Perelo Louisa Wessels, Gabernet Gisela, Straub Daniel, Nahnsen Sven

机构信息

Quantitative Biology Center (QBiC), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany.

M3 Research Center, Faculty of Medicine, University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany.

出版信息

NAR Genom Bioinform. 2024 Mar 7;6(1):lqae020. doi: 10.1093/nargab/lqae020. eCollection 2024 Mar.

Abstract

Data analysis tools are continuously changed and improved over time. In order to test how these changes influence the comparability between analyses, the output of different workflow options of the nf-core/rnaseq pipeline were compared. Five different pipeline settings (STAR+Salmon, STAR+RSEM, STAR+featureCounts, HISAT2+featureCounts, pseudoaligner Salmon) were run on three datasets (human, Arabidopsis, zebrafish) containing spike-ins of the External RNA Control Consortium (ERCC). Fold change ratios and differential expression of genes and spike-ins were used for comparative analyses of the different tools and versions settings of the pipeline. An overlap of 85% for differential gene classification between pipelines could be shown. Genes interpreted with a bias were mostly those present at lower concentration. Also, the number of isoforms and exons per gene were determinants. Previous pipeline versions using featureCounts showed a higher sensitivity to detect one-isoform genes like ERCC. To ensure data comparability in long-term analysis series it would be recommendable to either stay with the pipeline version the series was initialized with or to run both versions during a transition time in order to ensure that the target genes are addressed the same way.

摘要

随着时间的推移,数据分析工具不断变化和改进。为了测试这些变化如何影响分析之间的可比性,对nf-core/rnaseq管道不同工作流程选项的输出进行了比较。在包含外部RNA对照联盟(ERCC)加标的三个数据集(人类、拟南芥、斑马鱼)上运行了五种不同的管道设置(STAR+Salmon、STAR+RSEM、STAR+featureCounts、HISAT2+featureCounts、伪比对器Salmon)。基因和加标的倍数变化率以及差异表达用于对管道的不同工具和版本设置进行比较分析。可以显示不同管道之间差异基因分类的重叠率为85%。有偏差解释的基因大多是那些浓度较低的基因。此外,每个基因的异构体和外显子数量也是决定因素。以前使用featureCounts的管道版本在检测像ERCC这样的单异构体基因方面表现出更高的灵敏度。为了确保长期分析系列中的数据可比性,建议要么使用系列初始化时的管道版本,要么在过渡期间同时运行两个版本,以确保以相同的方式处理目标基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0c3/10919883/9aa9ef5f7f57/lqae020fig1.jpg

相似文献

1
How tool combinations in different pipeline versions affect the outcome in RNA-seq analysis.
NAR Genom Bioinform. 2024 Mar 7;6(1):lqae020. doi: 10.1093/nargab/lqae020. eCollection 2024 Mar.
2
DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics.
Int J Mol Sci. 2021 Jan 30;22(3):1399. doi: 10.3390/ijms22031399.
3
ARPIR: automatic RNA-Seq pipelines with interactive report.
BMC Bioinformatics. 2020 Dec 21;21(Suppl 19):574. doi: 10.1186/s12859-020-03846-2.
4
hppRNA-a Snakemake-based handy parameter-free pipeline for RNA-Seq analysis of numerous samples.
Brief Bioinform. 2018 Jul 20;19(4):622-626. doi: 10.1093/bib/bbw143.
5
A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells.
PLoS One. 2018 May 16;13(5):e0197162. doi: 10.1371/journal.pone.0197162. eCollection 2018.
6
Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples.
Gigascience. 2019 Dec 1;8(12). doi: 10.1093/gigascience/giz145.
9
nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data.
S Afr Comput J. 2021 Dec;33(2). doi: 10.18489/sacj.v33i2.830. Epub 2021 Dec 20.
10
Performance evaluation of lossy quality compression algorithms for RNA-seq data.
BMC Bioinformatics. 2020 Jul 20;21(1):321. doi: 10.1186/s12859-020-03658-4.

引用本文的文献

本文引用的文献

1
GTFtools: a software package for analyzing various features of gene models.
Bioinformatics. 2022 Oct 14;38(20):4806-4808. doi: 10.1093/bioinformatics/btac561.
2
Comparative evaluation of full-length isoform quantification from RNA-Seq.
BMC Bioinformatics. 2021 May 25;22(1):266. doi: 10.1186/s12859-021-04198-1.
3
Alignment and mapping methodology influence transcript abundance estimation.
Genome Biol. 2020 Sep 7;21(1):239. doi: 10.1186/s13059-020-02151-8.
4
Root Skewing-Associated Genes Impact the Spaceflight Response of .
Front Plant Sci. 2020 Mar 4;11:239. doi: 10.3389/fpls.2020.00239. eCollection 2020.
5
The nf-core framework for community-curated bioinformatics pipelines.
Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x.
6
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.
Nat Biotechnol. 2019 Aug;37(8):907-915. doi: 10.1038/s41587-019-0201-4. Epub 2019 Aug 2.
7
Singularity: Scientific containers for mobility of compute.
PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017.
9
Nextflow enables reproducible computational workflows.
Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820.
10
Salmon provides fast and bias-aware quantification of transcript expression.
Nat Methods. 2017 Apr;14(4):417-419. doi: 10.1038/nmeth.4197. Epub 2017 Mar 6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验