流水线：一个基于Nextflow的用于定义测序数据处理流水线的框架。

Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines.

作者信息

Federico Anthony, Karagiannis Tanya, Karri Kritika, Kishore Dileep, Koga Yusuke, Campbell Joshua D, Monti Stefano

机构信息

Bioinformatics Program, Boston University, Boston, MA, United States.

Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, United States.

出版信息

Front Genet. 2019 Jun 28;10:614. doi: 10.3389/fgene.2019.00614. eCollection 2019.

DOI:10.3389/fgene.2019.00614

PMID:31316552

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6609566/

Abstract

The advent of high-throughput sequencing technologies has led to the need for flexible and user-friendly data preprocessing platforms. The Pipeliner framework provides an out-of-the-box solution for processing various types of sequencing data. It combines the Nextflow scripting language and Anaconda package manager to generate modular computational workflows. We have used Pipeliner to create several pipelines for sequencing data processing including bulk RNA-sequencing (RNA-seq), single-cell RNA-seq, as well as digital gene expression data. This report highlights the design methodology behind Pipeliner that enables the development of highly flexible and reproducible pipelines that are easy to extend and maintain on multiple computing environments. We also provide a quick start user guide demonstrating how to setup and execute available pipelines with toy datasets.

摘要

高通量测序技术的出现导致了对灵活且用户友好的数据预处理平台的需求。Pipeliner框架为处理各种类型的测序数据提供了一个开箱即用的解决方案。它结合了Nextflow脚本语言和Anaconda包管理器来生成模块化的计算工作流程。我们已经使用Pipeliner创建了几个用于测序数据处理的管道，包括批量RNA测序（RNA-seq）、单细胞RNA-seq以及数字基因表达数据。本报告重点介绍了Pipeliner背后的设计方法，该方法能够开发出高度灵活且可重复的管道，这些管道易于在多个计算环境中扩展和维护。我们还提供了一个快速入门用户指南，展示了如何使用玩具数据集设置和执行可用的管道。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b98/6609566/00de4f089e1e/fgene-10-00614-g001.jpg

相似文献

Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines.流水线：一个基于Nextflow的用于定义测序数据处理流水线的框架。

Front Genet. 2019 Jun 28;10:614. doi: 10.3389/fgene.2019.00614. eCollection 2019.

DolphinNext: a distributed data processing platform for high throughput genomics.海豚下一代：一个用于高通量基因组学的分布式数据处理平台。

BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x.

Comparison of high-throughput single-cell RNA sequencing data processing pipelines.高通量单细胞 RNA 测序数据处理管道的比较。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa116.

scATACpipe: A nextflow pipeline for comprehensive and reproducible analyses of single cell ATAC-seq data.scATACpipe：用于单细胞ATAC测序数据全面且可重复分析的Nextflow工作流程。

Front Cell Dev Biol. 2022 Sep 27;10:981859. doi: 10.3389/fcell.2022.981859. eCollection 2022.

nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data.nf-rnaSeqCount：一个用于从RNA测序数据中获取原始读取计数的Nextflow管道。

S Afr Comput J. 2021 Dec;33(2). doi: 10.18489/sacj.v33i2.830. Epub 2021 Dec 20.

SEQprocess: a modularized and customizable pipeline framework for NGS processing in R package.SEQprocess：一个用于 R 包中 NGS 处理的模块化和可定制的管道框架。

BMC Bioinformatics. 2019 Feb 20;20(1):90. doi: 10.1186/s12859-019-2676-x.

Cactus: A user-friendly and reproducible ATAC-Seq and mRNA-Seq analysis pipeline for data preprocessing, differential analysis, and enrichment analysis.仙人掌：用于数据预处理、差异分析和富集分析的用户友好且可重复的 ATAC-Seq 和 mRNA-Seq 分析流程。

Genomics. 2024 Jul;116(4):110858. doi: 10.1016/j.ygeno.2024.110858. Epub 2024 May 11.

Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses.Unipro UGENE NGS 流水线和组件，用于变体调用、RNA-seq 和 ChIP-seq 数据分析。

PeerJ. 2014 Nov 4;2:e644. doi: 10.7717/peerj.644. eCollection 2014.

NFTest: automated testing of Nextflow pipelines.NFTest：用于 Nextflow 管道的自动化测试。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae081.

scAN1.0: A reproducible and standardized pipeline for processing 10X single cell RNAseq data.scAN1.0：用于处理 10X 单细胞 RNAseq 数据的可重复和标准化流程。

In Silico Biol. 2023;15(1-2):11-21. doi: 10.3233/ISB-220252.

引用本文的文献

Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions.将人工智能整合到下一代测序中：进展、挑战与未来方向。

Curr Issues Mol Biol. 2025 Jun 19;47(6):470. doi: 10.3390/cimb47060470.

TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data.TAGADA：一种利用RNA测序数据改进基因组注释的可扩展流程。

NAR Genom Bioinform. 2023 Oct 16;5(4):lqad089. doi: 10.1093/nargab/lqad089. eCollection 2023 Dec.

transXpress: a Snakemake pipeline for streamlined de novo transcriptome assembly and annotation.transXpress：用于简化从头转录组组装和注释的 SnakeMake 管道。

BMC Bioinformatics. 2023 Apr 4;24(1):133. doi: 10.1186/s12859-023-05254-8.

polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies.polishCLR：用于打磨 PacBio CLR 基因组组装的 Nextflow 工作流程。

Genome Biol Evol. 2023 Mar 3;15(3). doi: 10.1093/gbe/evad020.

Exploration of alcohol use disorder-associated brain miRNA-mRNA regulatory networks.酒精使用障碍相关脑 miRNA-mRNA 调控网络的探索。

Transl Psychiatry. 2021 Oct 2;11(1):504. doi: 10.1038/s41398-021-01635-w.

SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses.SPEAQeasy：一个用于 R/bioconductor 驱动的 RNA-seq 分析中表达分析和定量的可扩展流水线。

BMC Bioinformatics. 2021 May 1;22(1):224. doi: 10.1186/s12859-021-04142-3.

Loss of G-Protein Pathway Suppressor 2 Promotes Tumor Growth Through Activation of AKT Signaling.G蛋白信号通路抑制因子2的缺失通过激活AKT信号促进肿瘤生长。

Front Cell Dev Biol. 2021 Jan 7;8:608044. doi: 10.3389/fcell.2020.608044. eCollection 2020.

Bioinformatics recipes: creating, executing and distributing reproducible data analysis workflows.生物信息学食谱：创建、执行和分发可重复的数据分析工作流程。

BMC Bioinformatics. 2020 Jul 8;21(1):292. doi: 10.1186/s12859-020-03602-6.

DolphinNext: a distributed data processing platform for high throughput genomics.海豚下一代：一个用于高通量基因组学的分布式数据处理平台。

BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x.

本文引用的文献

Nextflow enables reproducible computational workflows.Nextflow支持可重复的计算工作流程。

Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820.

MultiQC: summarize analysis results for multiple tools and samples in a single report.MultiQC：在一份报告中汇总多个工具和样本的分析结果。

Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16.

Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results.RNA 测序数据在癌症基因组图谱中的替代预处理方法可改善分析结果。

Bioinformatics. 2015 Nov 15;31(22):3666-72. doi: 10.1093/bioinformatics/btv377. Epub 2015 Jul 24.

HISAT: a fast spliced aligner with low memory requirements.HISAT：一种内存需求低的快速剪接比对器。

Nat Methods. 2015 Apr;12(4):357-60. doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.StringTie能够从RNA测序读数中更完善地重建转录组。

Nat Biotechnol. 2015 Mar;33(3):290-5. doi: 10.1038/nbt.3122. Epub 2015 Feb 18.

HTSeq--a Python framework to work with high-throughput sequencing data.HTSeq——一个用于处理高通量测序数据的Python框架。

Bioinformatics. 2015 Jan 15;31(2):166-9. doi: 10.1093/bioinformatics/btu638. Epub 2014 Sep 25.

PRADA: pipeline for RNA sequencing data analysis.PRADA：RNA 测序数据分析流水线。

Bioinformatics. 2014 Aug 1;30(15):2224-6. doi: 10.1093/bioinformatics/btu169. Epub 2014 Apr 1.

featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.featureCounts：一个用于将序列读取分配给基因组特征的高效通用程序。

Bioinformatics. 2014 Apr 1;30(7):923-30. doi: 10.1093/bioinformatics/btt656. Epub 2013 Nov 13.

STAR: ultrafast universal RNA-seq aligner.STAR：超快通用 RNA-seq 对齐工具。

Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.

RSeQC: quality control of RNA-seq experiments.RSeQC：RNA-seq 实验的质量控制。

Bioinformatics. 2012 Aug 15;28(16):2184-5. doi: 10.1093/bioinformatics/bts356. Epub 2012 Jun 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

流水线：一个基于Nextflow的用于定义测序数据处理流水线的框架。

Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献