生物小容器：用于下一代测序流程单步执行的虚拟化容器。

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

作者信息

Kim Baekdoo, Ali Thahmina, Lijeron Carlos, Afgan Enis, Krampis Konstantinos

机构信息

Center for Translational and Basic Research and Belfer Research Building, Hunter College of The City University of New York, 413 E 69th St, New York, NY 10021.

Johns Hopkins University, Department of Biology, B3400 N Charles St, Mudd Hall 144, Baltimore MD 21218.

出版信息

Gigascience. 2017 Aug 1;6(8):1-7. doi: 10.1093/gigascience/gix048.

DOI:10.1093/gigascience/gix048

PMID:28854616

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5569920/

Abstract

Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets.

摘要

处理下一代测序（NGS）数据需要很高的技术技能，除了专门的分析后可视化和数据挖掘软件外，还涉及生物信息学数据管道的安装、配置和执行。为了应对其中一些挑战，开发者利用虚拟化容器在任何计算平台上无缝部署预配置的生物信息学软件和管道。我们提出了一种方法，用于抽象用于NGS数据分析的多步骤生物信息学管道的复杂数据操作。例如，我们已经部署了2个用于RNA测序和染色质免疫沉淀测序的管道，这些管道在我们称为Bio-Docklets的Docker虚拟化容器中进行了预配置。每个Bio-Docklet都暴露一个单一的数据输入和输出端点，从用户角度来看，运行这些管道就像运行一个单一的生物信息学工具一样简单。这是通过一个“元脚本”实现的，该脚本自动启动Bio-Docklets，并通过BioBlend软件库和Galaxy应用程序编程接口控制管道执行。管道输出通过与Visual Omics Explorer框架集成进行后处理，提供用户可以通过网页浏览器访问的交互式数据可视化。我们的目标是让非生物信息学专家在任何计算环境中都能轻松访问NGS数据分析管道，无论是实验室工作站、大学计算机集群还是云服务提供商。除了终端用户，Bio-Docklets还使开发者能够以编程方式部署和运行大量管道实例，以便对多个数据集进行并发分析。

相似文献

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.生物小容器：用于下一代测序流程单步执行的虚拟化容器。

Gigascience. 2017 Aug 1;6(8):1-7. doi: 10.1093/gigascience/gix048.

DolphinNext: a distributed data processing platform for high throughput genomics.海豚下一代：一个用于高通量基因组学的分布式数据处理平台。

BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x.

miCloud: A Plug-n-Play, Extensible, On-Premises Bioinformatics Cloud for Seamless Execution of Complex Next-Generation Sequencing Data Analysis Pipelines.miCloud：一个即插即用、可扩展的本地生物信息学云平台，用于无缝执行复杂的下一代测序数据分析流程。

J Comput Biol. 2019 Mar;26(3):280-284. doi: 10.1089/cmb.2018.0218. Epub 2019 Jan 17.

Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using ngs.plot.使用ngs.plot对ChIP-Seq和RNA-Seq序列比对进行分析和可视化

Methods Mol Biol. 2016;1415:371-83. doi: 10.1007/978-1-4939-3572-7_18.

Closha: bioinformatics workflow system for the analysis of massive sequencing data.Closha：用于大规模测序数据分析的生物信息学工作流系统。

BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):43. doi: 10.1186/s12859-018-2019-3.

DNAscan: personal computer compatible NGS analysis, annotation and visualisation.DNAscan：个人计算机兼容的 NGS 分析、注释和可视化。

BMC Bioinformatics. 2019 Apr 27;20(1):213. doi: 10.1186/s12859-019-2791-8.

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing.CSI NGS 门户：一个用于自动化 NGS 数据分析和共享的在线平台。

Int J Mol Sci. 2020 May 28;21(11):3828. doi: 10.3390/ijms21113828.

CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction.CIPHER：一个用于整合下一代测序数据分析和基因组调控元件预测的灵活且功能广泛的工作流程平台。

BMC Bioinformatics. 2017 Aug 8;18(1):363. doi: 10.1186/s12859-017-1770-1.

Analysis of next-generation sequencing data using Galaxy.使用Galaxy分析下一代测序数据。

Methods Mol Biol. 2014;1150:21-43. doi: 10.1007/978-1-4939-0512-6_2.

GUIdock-VNC: using a graphical desktop sharing system to provide a browser-based interface for containerized software.GUIdock-VNC：使用图形桌面共享系统为容器化软件提供基于浏览器的界面。

Gigascience. 2017 Apr 1;6(4):1-6. doi: 10.1093/gigascience/giw013.

引用本文的文献

Harmonizing and integrating the NCI Genomic Data Commons through accessible, interactive, and cloud-enabled workflows.通过可访问、交互式且支持云的工作流程，协调和整合美国国立癌症研究所基因组数据共享库。

PLoS One. 2025 Mar 4;20(3):e0318676. doi: 10.1371/journal.pone.0318676. eCollection 2025.

VPipe: an Automated Bioinformatics Platform for Assembly and Management of Viral Next-Generation Sequencing Data.VPipe：一个用于病毒下一代测序数据组装和管理的自动化生物信息学平台。

Microbiol Spectr. 2022 Apr 27;10(2):e0256421. doi: 10.1128/spectrum.02564-21. Epub 2022 Mar 2.

Democratizing bioinformatics through easily accessible software platforms for non-experts in the field.通过为该领域的非专家提供易于访问的软件平台，使生物信息学大众化。

Biotechniques. 2022 Feb;72(2):36-38. doi: 10.2144/btn-2021-0060. Epub 2022 Jan 21.

Towards reproducible computational drug discovery.迈向可重复的计算药物发现。

J Cheminform. 2020 Jan 28;12(1):9. doi: 10.1186/s13321-020-0408-x.

Building Containerized Workflows Using the BioDepot-Workflow-Builder.使用 BioDepot-Workflow-Builder 构建容器化工作流程。

Cell Syst. 2019 Nov 27;9(5):508-514.e3. doi: 10.1016/j.cels.2019.08.007. Epub 2019 Sep 11.

J Comput Biol. 2019 Mar;26(3):280-284. doi: 10.1089/cmb.2018.0218. Epub 2019 Jan 17.

Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines.可重复的生物信息学项目：一个用于可重复生物信息学分析流程的社区。

BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):349. doi: 10.1186/s12859-018-2296-x.

YAMP: a containerized workflow enabling reproducibility in metagenomics research.YAMP：一种用于元基因组学研究的可重复使用的容器化工作流程。

Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy072.

Extending TCGA queries to automatically identify analogous genomic data from dbGaP.扩展癌症基因组图谱（TCGA）查询以自动从数据库基因型和 phenotype 数据库（dbGaP）中识别类似的基因组数据。

F1000Res. 2017 Mar 24;6:319. doi: 10.12688/f1000research.9837.1. eCollection 2017.

本文引用的文献

Building Containerized Workflows Using the BioDepot-Workflow-Builder.使用 BioDepot-Workflow-Builder 构建容器化工作流程。

Cell Syst. 2019 Nov 27;9(5):508-514.e3. doi: 10.1016/j.cels.2019.08.007. Epub 2019 Sep 11.

Nextflow enables reproducible computational workflows.Nextflow支持可重复的计算工作流程。

Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820.

BioContainers: an open-source and community-driven framework for software standardization.生物容器：一个开源且由社区驱动的软件标准化框架。

Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192.

AlgoRun: a Docker-based packaging system for platform-agnostic implemented algorithms.AlgoRun：一种用于与平台无关的已实现算法的基于Docker的打包系统。

Bioinformatics. 2016 Aug 1;32(15):2396-8. doi: 10.1093/bioinformatics/btw120. Epub 2016 Mar 2.

Visual Omics Explorer (VOE): a cross-platform portal for interactive data visualization.视觉组学浏览器（VOE）：一个用于交互式数据可视化的跨平台门户。

Bioinformatics. 2016 Jul 1;32(13):2050-2. doi: 10.1093/bioinformatics/btw119. Epub 2016 Mar 7.

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update.用于可访问、可重复和协作式生物医学分析的Galaxy平台：2016年更新

Nucleic Acids Res. 2016 Jul 8;44(W1):W3-W10. doi: 10.1093/nar/gkw343. Epub 2016 May 2.

GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research.GUIdock：使用带有通用图形用户界面的Docker容器来解决研究的可重复性问题。

PLoS One. 2016 Apr 5;11(4):e0152686. doi: 10.1371/journal.pone.0152686. eCollection 2016.

BioShaDock: a community driven bioinformatics shared Docker-based tools registry.BioShaDock：一个由社区驱动的基于Docker的生物信息学共享工具注册表。

F1000Res. 2015 Dec 14;4:1443. doi: 10.12688/f1000research.7536.1. eCollection 2015.

Bioboxes: standardised containers for interchangeable bioinformatics software.生物信息盒：用于可互换生物信息学软件的标准化容器。

Gigascience. 2015 Oct 15;4:47. doi: 10.1186/s13742-015-0087-0. eCollection 2015.

The impact of Docker containers on the performance of genomic pipelines.Docker容器对基因组分析流程性能的影响。

PeerJ. 2015 Sep 24;3:e1273. doi: 10.7717/peerj.1273. eCollection 2015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

生物小容器：用于下一代测序流程单步执行的虚拟化容器。

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献