Suppr超能文献

海豚下一代:一个用于高通量基因组学的分布式数据处理平台。

DolphinNext: a distributed data processing platform for high throughput genomics.

机构信息

Bioinformatics Core, University of Massachusetts Medical School, Worcester, MA, 01605, USA.

RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, MA, 01605, USA.

出版信息

BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x.

Abstract

BACKGROUND

The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations.

RESULTS

To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as high-performance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with R-markdown and shiny support for interactive data visualization and analysis.

CONCLUSION

DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results.

摘要

背景

高通量技术的出现产生了大量的基因组数据,例如下一代测序(NGS),正在改变生物研究。数据量的急剧增加、数据处理工具、算法和数据库的多样性和不断变化使得分析成为科学发现的主要瓶颈。高通量数据集的处理通常涉及许多不同的计算程序,每个程序在管道中执行特定的步骤。鉴于广泛的应用和组织基础设施,非常需要高度并行、灵活、可移植和可重复的数据处理框架。目前有几个平台可用于设计和执行复杂的管道。不幸的是,当前的平台缺乏当前研究环境所需的并行性、可移植性、灵活性和/或可重复性的必要组合。为了解决这些缺点,最近出现了提供开发和共享可移植管道平台的工作流框架。我们通过提供创建、维护和执行复杂管道的图形用户界面来补充这些新平台。这样的平台将简化非技术用户的强大和可重复的工作流程创建,以及为大型组织提供强大的管道维护平台。

结果

为了简化复杂管道的开发、维护和执行,我们创建了 DolphinNext。DolphinNext 通过在图形界面中实现模块化方法,使用基于强大的 Nextflow 工作流框架的拖放用户界面,为构建和部署复杂管道提供了便利,该图形界面提供了以下功能:1. 可视化管道并允许用户在不熟悉底层编程语言的情况下创建管道的拖放用户界面。2. 在高性能集群和/或云等分布式计算环境中执行和监控管道的模块。3. 具有版本跟踪和可独立运行的独立版本的可重复管道。4. 具有流程修订支持的模块化流程设计,以提高可重用性和管道开发效率。5. 具有 GitHub 和自动化测试的管道共享。6. 带有 R-markdown 和 shiny 支持的广泛报告,用于交互式数据可视化和分析。

结论

DolphinNext 是一个灵活、直观的基于网络的数据处理和分析平台,它可以创建、部署、共享和执行复杂的 Nextflow 管道,并具有广泛的修订和交互式报告,以增强可重复的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da52/7168977/c72bf69cbcf9/12864_2020_6714_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验