• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Geniac:用于Nextflow管道的自动配置生成器与安装程序。

Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines.

作者信息

Allain Fabrice, Roméjon Julien, La Rosa Philippe, Jarlier Frédéric, Servant Nicolas, Hupé Philippe

机构信息

Mines Paris Tech, Fontainebleau, F-77305, France.

Institut Curie, Paris, F-75005, France.

出版信息

Open Res Eur. 2022 Feb 21;1:76. doi: 10.12688/openreseurope.13861.2. eCollection 2021.

DOI:10.12688/openreseurope.13861.2
PMID:37645091
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10445886/
Abstract

With the advent of high-throughput biotechnological platforms and their ever-growing capacity, life science has turned into a digitized, computational and data-intensive discipline. As a consequence, standard analysis with a bioinformatics pipeline in the context of routine production has become a challenge such that the data can be processed in real-time and delivered to the end-users as fast as possible. The usage of workflow management systems along with packaging systems and containerization technologies offer an opportunity to tackle this challenge. While very powerful, they can be used and combined in many multiple ways which may differ from one developer to another. Therefore, promoting the homogeneity of the workflow implementation requires guidelines and protocols which detail how the source code of the bioinformatics pipeline should be written and organized to ensure its usability, maintainability, interoperability, sustainability, portability, reproducibility, scalability and efficiency. Capitalizing on Nextflow, Conda, Docker, Singularity and the nf-core initiative, we propose a set of best practices along the development life cycle of the bioinformatics pipeline and deployment for production operations which target different expert communities including i) the bioinformaticians and statisticians ii) the software engineers and iii) the data managers and core facility engineers. We implemented Geniac (Automatic Configuration GENerator and Installer for nextflow pipelines) which consists of a toolbox with three components: i) a technical documentation available at https://geniac.readthedocs.io to detail coding guidelines for the bioinformatics pipeline with Nextflow, ii) a command line interface with a linter to check that the code respects the guidelines, and iii) an add-on to generate configuration files, build the containers and deploy the pipeline. The Geniac toolbox aims at the harmonization of development practices across developers and automation of the generation of configuration files and containers by parsing the source code of the Nextflow pipeline.

摘要

随着高通量生物技术平台的出现及其能力的不断增长,生命科学已转变为一门数字化、计算密集型和数据密集型学科。因此,在常规生产环境中使用生物信息学管道进行标准分析已成为一项挑战,即数据能够实时处理并尽快交付给最终用户。工作流管理系统与打包系统和容器化技术的结合使用为应对这一挑战提供了契机。虽然它们功能强大,但使用方式和组合方式多种多样,不同开发者可能有所不同。因此,促进工作流实施的同质性需要详细说明生物信息学管道源代码应如何编写和组织以确保其可用性、可维护性、互操作性、可持续性、可移植性、可重复性、可扩展性和效率的指导方针和协议。利用Nextflow、Conda、Docker、Singularity和nf-core计划,我们针对不同的专家群体,在生物信息学管道的开发生命周期和生产运营部署过程中提出了一套最佳实践,这些专家群体包括:(i)生物信息学家和统计学家;(ii)软件工程师;以及(iii)数据管理人员和核心设施工程师。我们实现了Geniac(Nextflow管道自动配置生成器和安装程序),它由一个包含三个组件的工具箱组成:(i)一个技术文档,可在https://geniac.readthedocs.io获取,详细介绍使用Nextflow编写生物信息学管道的编码指南;(ii)一个带有代码检查器的命令行界面,用于检查代码是否符合指南;以及(iii)一个插件,用于生成配置文件、构建容器和部署管道。Geniac工具箱旨在通过解析Nextflow管道的源代码,使不同开发者的开发实践趋于一致,并实现配置文件和容器生成的自动化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eccc/10446368/7f3a4704903d/openreseurope-1-15693-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eccc/10446368/548d6eb3ffca/openreseurope-1-15693-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eccc/10446368/fca95edc1e74/openreseurope-1-15693-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eccc/10446368/7f3a4704903d/openreseurope-1-15693-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eccc/10446368/548d6eb3ffca/openreseurope-1-15693-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eccc/10446368/fca95edc1e74/openreseurope-1-15693-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eccc/10446368/7f3a4704903d/openreseurope-1-15693-g0002.jpg

相似文献

1
Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines.Geniac:用于Nextflow管道的自动配置生成器与安装程序。
Open Res Eur. 2022 Feb 21;1:76. doi: 10.12688/openreseurope.13861.2. eCollection 2021.
2
eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity.eDNAFlow,一种利用 Nextflow 和 Singularity 的自动化、可重复和可扩展的环境 DNA 序列分析工作流程。
Mol Ecol Resour. 2021 Jul;21(5):1697-1704. doi: 10.1111/1755-0998.13356. Epub 2021 Mar 9.
3
nf-core/clipseq - a robust Nextflow pipeline for comprehensive CLIP data analysis.nf-core/clipseq - 一个用于全面CLIP数据分析的强大的Nextflow工作流程。
Wellcome Open Res. 2023 Jul 4;8:286. doi: 10.12688/wellcomeopenres.19453.1. eCollection 2023.
4
Using prototyping to choose a bioinformatics workflow management system.使用原型法选择生物信息学工作流管理系统。
PLoS Comput Biol. 2021 Feb 25;17(2):e1008622. doi: 10.1371/journal.pcbi.1008622. eCollection 2021 Feb.
5
nf-core/nanostring: a pipeline for reproducible NanoString nCounter analysis.nf-core/nanostring:用于可重复的 NanoString nCounter 分析的流水线。
Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btae019.
6
PM4NGS, a project management framework for next-generation sequencing data analysis.PM4NGS,一个用于下一代测序数据分析的项目管理框架。
Gigascience. 2021 Jan 7;10(1). doi: 10.1093/gigascience/giaa141.
7
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.生物小容器:用于下一代测序流程单步执行的虚拟化容器。
Gigascience. 2017 Aug 1;6(8):1-7. doi: 10.1093/gigascience/gix048.
8
Scalable Workflows and Reproducible Data Analysis for Genomics.基因组学的可扩展工作流程和可重复数据分析
Methods Mol Biol. 2019;1910:723-745. doi: 10.1007/978-1-4939-9074-0_24.
9
scalepopgen: Bioinformatic Workflow Resources Implemented in Nextflow for Comprehensive Population Genomic Analyses.scalepopgen:在 Nextflow 中实现的用于全面群体基因组分析的生物信息学工作流程资源。
Mol Biol Evol. 2024 Apr 2;41(4). doi: 10.1093/molbev/msae057.
10
: development workflow protocols for bioinformatics pipelines with git and GitLab.: 使用 Git 和 GitLab 制定生物信息学管道的开发工作流程协议。
F1000Res. 2020 Jun 22;9:632. doi: 10.12688/f1000research.24714.3. eCollection 2020.

引用本文的文献

1
Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions.将人工智能整合到下一代测序中:进展、挑战与未来方向。
Curr Issues Mol Biol. 2025 Jun 19;47(6):470. doi: 10.3390/cimb47060470.

本文引用的文献

1
: development workflow protocols for bioinformatics pipelines with git and GitLab.: 使用 Git 和 GitLab 制定生物信息学管道的开发工作流程协议。
F1000Res. 2020 Jun 22;9:632. doi: 10.12688/f1000research.24714.3. eCollection 2020.
2
Using prototyping to choose a bioinformatics workflow management system.使用原型法选择生物信息学工作流管理系统。
PLoS Comput Biol. 2021 Feb 25;17(2):e1008622. doi: 10.1371/journal.pcbi.1008622. eCollection 2021 Feb.
3
Streamlining data-intensive biology with workflow systems.使用工作流程系统简化数据密集型生物学研究。
Gigascience. 2021 Jan 13;10(1). doi: 10.1093/gigascience/giaa140.
4
Practical guide for managing large-scale human genome data in research.研究中管理大规模人类基因组数据的实用指南。
J Hum Genet. 2021 Jan;66(1):39-52. doi: 10.1038/s10038-020-00862-1. Epub 2020 Oct 23.
5
QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing.QUARTIC:用于高通量测序数据处理的快速并行算法。
F1000Res. 2020 Apr 6;9:240. doi: 10.12688/f1000research.22954.3. eCollection 2020.
6
The Birth of Bio-data Science: Trends, Expectations, and Applications.生物数据科学的诞生:趋势、期望与应用
Genomics Proteomics Bioinformatics. 2020 Feb;18(1):5-15. doi: 10.1016/j.gpb.2020.01.002. Epub 2020 May 16.
7
The nf-core framework for community-curated bioinformatics pipelines.用于社区策划生物信息学流程的nf-core框架。
Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x.
8
Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software.Bionitio:展示和促进生物信息学命令行软件的最佳实践。
Gigascience. 2019 Sep 1;8(9). doi: 10.1093/gigascience/giz109.
9
Recommendations for the packaging and containerizing of bioinformatics software.生物信息学软件的包装与容器化建议。
F1000Res. 2018 Jun 14;7. doi: 10.12688/f1000research.15140.2. eCollection 2018.
10
Scalable Workflows and Reproducible Data Analysis for Genomics.基因组学的可扩展工作流程和可重复数据分析
Methods Mol Biol. 2019;1910:723-745. doi: 10.1007/978-1-4939-9074-0_24.