BioQueue：一种用于加速生物信息学分析的新型管道框架。

BioQueue: a novel pipeline framework to accelerate bioinformatics analysis.

机构信息

College of Life Science, Northeast Forestry University, Harbin 150040, China.

School of Life Science and Technology, Shanghai Tech University, Shanghai 200031, China.

出版信息

Bioinformatics. 2017 Oct 15;33(20):3286-3288. doi: 10.1093/bioinformatics/btx403.

DOI:10.1093/bioinformatics/btx403

PMID:28633441

Abstract

MOTIVATION

With the rapid development of Next-Generation Sequencing, a large amount of data is now available for bioinformatics research. Meanwhile, the presence of many pipeline frameworks makes it possible to analyse these data. However, these tools concentrate mainly on their syntax and design paradigms, and dispatch jobs based on users' experience about the resources needed by the execution of a certain step in a protocol. As a result, it is difficult for these tools to maximize the potential of computing resources, and avoid errors caused by overload, such as memory overflow.

RESULTS

Here, we have developed BioQueue, a web-based framework that contains a checkpoint before each step to automatically estimate the system resources (CPU, memory and disk) needed by the step and then dispatch jobs accordingly. BioQueue possesses a shell command-like syntax instead of implementing a new script language, which means most biologists without computer programming background can access the efficient queue system with ease.

AVAILABILITY AND IMPLEMENTATION

BioQueue is freely available at https://github.com/liyao001/BioQueue. The extensive documentation can be found at http://bioqueue.readthedocs.io.

CONTACT

li_yao@outlook.com or gcsui@nefu.edu.cn.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

随着下一代测序技术的快速发展，现在有大量的数据可用于生物信息学研究。同时，许多流水线框架的存在使得分析这些数据成为可能。然而，这些工具主要集中在它们的语法和设计范式上，并根据用户对执行协议中某一步骤所需资源的经验来调度作业。因此，这些工具很难最大限度地利用计算资源，并避免因过载（如内存溢出）而导致的错误。

结果

在这里，我们开发了一个基于网络的框架 BioQueue，该框架在每个步骤之前都有一个检查点，可自动估算步骤所需的系统资源（CPU、内存和磁盘），然后相应地调度作业。BioQueue 采用类似于 shell 命令的语法，而不是实现新的脚本语言，这意味着大多数没有计算机编程背景的生物学家都可以轻松访问高效的队列系统。

可用性和实现

BioQueue 可在 https://github.com/liyao001/BioQueue 上免费获得。广泛的文档可在 http://bioqueue.readthedocs.io 上找到。

联系方式

li_yao@outlook.com 或 gcsui@nefu.edu.cn。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

BioQueue: a novel pipeline framework to accelerate bioinformatics analysis.BioQueue：一种用于加速生物信息学分析的新型管道框架。

Bioinformatics. 2017 Oct 15;33(20):3286-3288. doi: 10.1093/bioinformatics/btx403.

BigDataScript: a scripting language for data pipelines.大数据脚本语言：一种用于数据管道的脚本语言。

Bioinformatics. 2015 Jan 1;31(1):10-6. doi: 10.1093/bioinformatics/btu595. Epub 2014 Sep 3.

BPP: a sequence-based algorithm for branch point prediction.BPP：一种基于序列的分支点预测算法。

Bioinformatics. 2017 Oct 15;33(20):3166-3172. doi: 10.1093/bioinformatics/btx401.

piPipes: a set of pipelines for piRNA and transposon analysis via small RNA-seq, RNA-seq, degradome- and CAGE-seq, ChIP-seq and genomic DNA sequencing.piPipes：一组通过小RNA测序、RNA测序、降解组和CAGE测序、染色质免疫沉淀测序以及基因组DNA测序进行piRNA和转座子分析的管道。

Bioinformatics. 2015 Feb 15;31(4):593-5. doi: 10.1093/bioinformatics/btu647. Epub 2014 Oct 17.

CPSS 2.0: a computational platform update for the analysis of small RNA sequencing data.CPSS 2.0：一个用于小 RNA 测序数据分析的计算平台更新。

Bioinformatics. 2017 Oct 15;33(20):3289-3291. doi: 10.1093/bioinformatics/btx066.

Cloud-based introduction to BASH programming for biologists.基于云的生物学 BASH 编程入门。

Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae244.

CRISPRcloud: a secure cloud-based pipeline for CRISPR pooled screen deconvolution.CRISPRcloud：一个用于 CRISPR 池式筛选去卷积的安全的基于云的流水线。

Bioinformatics. 2017 Sep 15;33(18):2963-2965. doi: 10.1093/bioinformatics/btx335.

Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data.Pyicos：一个用于高通量测序数据分析的多功能工具包。

Bioinformatics. 2011 Dec 15;27(24):3333-40. doi: 10.1093/bioinformatics/btr570. Epub 2011 Oct 12.

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.生物小容器：用于下一代测序流程单步执行的虚拟化容器。

Gigascience. 2017 Aug 1;6(8):1-7. doi: 10.1093/gigascience/gix048.

rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.rSeqNP：一种用于从RNA测序数据中检测差异表达和剪接的非参数方法。

Bioinformatics. 2015 Jul 1;31(13):2222-4. doi: 10.1093/bioinformatics/btv119. Epub 2015 Feb 24.

引用本文的文献

Glioma-derived ANXA1 suppresses the immune response to TLR3 ligands by promoting an anti-inflammatory tumor microenvironment.胶质瘤来源的膜联蛋白A1通过促进抗炎性肿瘤微环境来抑制对Toll样受体3配体的免疫反应。

Cell Mol Immunol. 2024 Jan;21(1):47-59. doi: 10.1038/s41423-023-01110-0. Epub 2023 Dec 4.

Genomic variation in the genus Beta based on 656 sequenced beet genomes.基于 656 个测序甜菜基因组的贝塔属基因组变异。

Sci Rep. 2023 May 27;13(1):8654. doi: 10.1038/s41598-023-35691-7.

A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers.用于全基因组鉴定活性增强子的实验分析与分析方法的比较。

Nat Biotechnol. 2022 Jul;40(7):1056-1065. doi: 10.1038/s41587-022-01211-7. Epub 2022 Feb 17.

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing.CSI NGS 门户：一个用于自动化 NGS 数据分析和共享的在线平台。

Int J Mol Sci. 2020 May 28;21(11):3828. doi: 10.3390/ijms21113828.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

BioQueue：一种用于加速生物信息学分析的新型管道框架。

BioQueue: a novel pipeline framework to accelerate bioinformatics analysis.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献