nf-rnaSeqCount：一个用于从RNA测序数据中获取原始读取计数的Nextflow管道。

nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data.

作者信息

Mpangase Phelelani T, Frost Jacqueline, Tikly Mohammed, Ramsay Michèle, Hazelhurst Scott

机构信息

Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg.

Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg.

出版信息

S Afr Comput J. 2021 Dec;33(2). doi: 10.18489/sacj.v33i2.830. Epub 2021 Dec 20.

DOI:10.18489/sacj.v33i2.830

PMID:35574063

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9097006/

Abstract

The rate of raw sequence production through Next-Generation Sequencing (NGS) has been growing exponentially due to improved technology and reduced costs. This has enabled researchers to answer many biological questions through "multi-omics" data analyses. Even though such data promises new insights into how biological systems function and understanding disease mechanisms, computational analyses performed on such large datasets comes with its challenges and potential pitfalls. The aim of this study was to develop a robust portable and reproducible bioinformatic pipeline for the automation of RNA sequencing (RNA-seq) data analyses. Using Nextflow as a workflow management system and Singularity for application containerisation, the nf-rnaSeqCount pipeline was developed for mapping raw RNA-seq reads to a reference genome and quantifying abundance of identified genomic features for differential gene expression analyses. The pipeline provides a quick and efficient way to obtain a matrix of read counts that can be used with tools such as DESeq2 and edgeR for differential expression analysis. Robust and flexible bioinformatic and computational pipelines for RNA-seq data analysis, from QC to sequence alignment and comparative analyses, will reduce analysis time, and increase accuracy and reproducibility of findings to promote transcriptome research.

摘要

由于技术改进和成本降低，通过下一代测序（NGS）产生原始序列的速度呈指数级增长。这使研究人员能够通过“多组学”数据分析回答许多生物学问题。尽管此类数据有望为生物系统如何运作以及理解疾病机制提供新的见解，但对如此大型数据集进行的计算分析也伴随着挑战和潜在陷阱。本研究的目的是开发一种强大的、便携式且可重复的生物信息学流程，用于RNA测序（RNA-seq）数据分析的自动化。使用Nextflow作为工作流程管理系统，并使用Singularity进行应用程序容器化，开发了nf-rnaSeqCount流程，用于将原始RNA-seq读数映射到参考基因组，并量化已识别基因组特征的丰度以进行差异基因表达分析。该流程提供了一种快速有效的方法来获得读数计数矩阵，该矩阵可与DESeq2和edgeR等工具一起用于差异表达分析。从质量控制到序列比对和比较分析，用于RNA-seq数据分析的强大且灵活的生物信息学和计算流程将减少分析时间，并提高研究结果的准确性和可重复性，以促进转录组研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8a/9097006/52fec817f3cc/nihms-1799173-f0001.jpg

相似文献

nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data.nf-rnaSeqCount：一个用于从RNA测序数据中获取原始读取计数的Nextflow管道。

S Afr Comput J. 2021 Dec;33(2). doi: 10.18489/sacj.v33i2.830. Epub 2021 Dec 20.

Cactus: A user-friendly and reproducible ATAC-Seq and mRNA-Seq analysis pipeline for data preprocessing, differential analysis, and enrichment analysis.仙人掌：用于数据预处理、差异分析和富集分析的用户友好且可重复的 ATAC-Seq 和 mRNA-Seq 分析流程。

Genomics. 2024 Jul;116(4):110858. doi: 10.1016/j.ygeno.2024.110858. Epub 2024 May 11.

eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity.eDNAFlow，一种利用 Nextflow 和 Singularity 的自动化、可重复和可扩展的环境 DNA 序列分析工作流程。

Mol Ecol Resour. 2021 Jul;21(5):1697-1704. doi: 10.1111/1755-0998.13356. Epub 2021 Mar 9.

nf-core/clipseq - a robust Nextflow pipeline for comprehensive CLIP data analysis.nf-core/clipseq - 一个用于全面CLIP数据分析的强大的Nextflow工作流程。

Wellcome Open Res. 2023 Jul 4;8:286. doi: 10.12688/wellcomeopenres.19453.1. eCollection 2023.

SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.SPARTA：用于基于参考的细菌RNA测序转录组自动分析的简单程序。

BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.

Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines.流水线：一个基于Nextflow的用于定义测序数据处理流水线的框架。

Front Genet. 2019 Jun 28;10:614. doi: 10.3389/fgene.2019.00614. eCollection 2019.

scATACpipe: A nextflow pipeline for comprehensive and reproducible analyses of single cell ATAC-seq data.scATACpipe：用于单细胞ATAC测序数据全面且可重复分析的Nextflow工作流程。

Front Cell Dev Biol. 2022 Sep 27;10:981859. doi: 10.3389/fcell.2022.981859. eCollection 2022.

GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure.GEMmaker：在异构计算基础设施上处理大规模 RNA-seq 数据集。

BMC Bioinformatics. 2022 May 2;23(1):156. doi: 10.1186/s12859-022-04629-7.

NetSeekR: a network analysis pipeline for RNA-Seq time series data.NetSeekR：一个用于 RNA-Seq 时间序列数据的网络分析管道。

BMC Bioinformatics. 2022 Jan 28;23(1):54. doi: 10.1186/s12859-021-04554-1.

SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses.SPEAQeasy：一个用于 R/bioconductor 驱动的 RNA-seq 分析中表达分析和定量的可扩展流水线。

BMC Bioinformatics. 2021 May 1;22(1):224. doi: 10.1186/s12859-021-04142-3.

引用本文的文献

Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions.将人工智能整合到下一代测序中：进展、挑战与未来方向。

Curr Issues Mol Biol. 2025 Jun 19;47(6):470. doi: 10.3390/cimb47060470.

Study protocol for the Bio-HEAT study: Investigating the Biological pathways from HEAT exposure to preterm birth and other adverse maternal and child health outcomes in South Africa.生物热研究的研究方案：探究南非从热暴露到早产及其他母婴健康不良结局的生物学途径。

Wellcome Open Res. 2025 Jun 6;10:121. doi: 10.12688/wellcomeopenres.23616.2. eCollection 2025.

Nextflow vs. plain bash: different approaches to the parallelization of SNP calling from the whole genome sequence data.Nextflow与普通bash：从全基因组序列数据进行单核苷酸多态性（SNP）检测并行化的不同方法。

NAR Genom Bioinform. 2024 Apr 29;6(2):lqae040. doi: 10.1093/nargab/lqae040. eCollection 2024 Jun.

本文引用的文献

mRNA expression data in breast cancers before and after consumption of walnut by women.女性食用核桃前后乳腺癌中的mRNA表达数据。

Data Brief. 2019 May 23;25:104050. doi: 10.1016/j.dib.2019.104050. eCollection 2019 Aug.

Dietary walnut altered gene expressions related to tumor growth, survival, and metastasis in breast cancer patients: a pilot clinical trial.膳食核桃改变了乳腺癌患者肿瘤生长、存活和转移相关的基因表达：一项初步临床试验。

Nutr Res. 2019 Jun;66:82-94. doi: 10.1016/j.nutres.2019.03.004. Epub 2019 Mar 10.

The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads.Rsubread 软件包在 RNA 测序reads 的比对和定量方面，具有更简单、更快、更便宜和更好的优势。

Nucleic Acids Res. 2019 May 7;47(8):e47. doi: 10.1093/nar/gkz114.

Dysregulation of the Wnt signaling pathway in South African patients with diffuse systemic sclerosis.南非弥漫性系统性硬皮病患者中 Wnt 信号通路失调。

Clin Rheumatol. 2019 Mar;38(3):933-938. doi: 10.1007/s10067-018-4298-5. Epub 2018 Sep 20.

Watchdog - a workflow management system for the distributed analysis of large-scale experimental data.Watchdog - 一种用于大规模实验数据分析的分布式工作流管理系统。

BMC Bioinformatics. 2018 Mar 13;19(1):97. doi: 10.1186/s12859-018-2107-4.

Singularity: Scientific containers for mobility of compute.奇点：用于计算移动性的科学容器。

PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017.

Nextflow enables reproducible computational workflows.Nextflow支持可重复的计算工作流程。

Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820.

Use of application containers and workflows for genomic data analysis.应用容器和工作流程在基因组数据分析中的应用。

J Pathol Inform. 2016 Dec 30;7:53. doi: 10.4103/2153-3539.197197. eCollection 2016.

Democratic databases: science on GitHub.民主数据库：GitHub上的科学。

Nature. 2016 Oct 6;538(7623):127-128. doi: 10.1038/538127a.

Tools and techniques for computational reproducibility.计算可重复性的工具和技术。

Gigascience. 2016 Jul 11;5(1):30. doi: 10.1186/s13742-016-0135-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

nf-rnaSeqCount：一个用于从RNA测序数据中获取原始读取计数的Nextflow管道。

nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献