Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.
Bioinformatics. 2017 May 15;33(10):1565-1567. doi: 10.1093/bioinformatics/btx003.
Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME.
See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license).
robert.kueffner@helmholtz-muenchen.de.
Supplementary data are available at Bioinformatics online.
下一代测序(NGS)数据分析需要通过将具有复杂输入和输出格式的各种工具链接在一起处理大型数据集。为了实现数据分析的自动化,我们建议将 NGS 任务标准化为模块化工作流程。这简化了可靠的 NGS 数据处理和处理,相应的解决方案变得更加可重复且更容易维护。在这里,我们提出了一个基于 Linux 的记录在案的工具框,其中包含 42 个处理模块,这些模块组合在一起构建了工作流程,以方便进行各种任务,例如 DNAseq 和 RNAseq 分析。我们还描述了重要的技术扩展。高吞吐量执行器(HTE)有助于在处理复杂数据集时提高可靠性并减少手动干预。我们还提供了一个专用的二进制管理器,可帮助用户获取模块的可执行文件并保持其最新状态。作为这个积极开发的工具框的基础,我们使用工作流管理软件 KNIME。
有关节点和用户手册(GPLv3 许可证),请参见 http://ibisngs.github.io/knime4ngs。
robert.kueffner@helmholtz-muenchen.de。
补充数据可在“Bioinformatics”在线获得。