College of Life Science, Northeast Forestry University, Harbin 150040, China.
School of Life Science and Technology, Shanghai Tech University, Shanghai 200031, China.
Bioinformatics. 2017 Oct 15;33(20):3286-3288. doi: 10.1093/bioinformatics/btx403.
With the rapid development of Next-Generation Sequencing, a large amount of data is now available for bioinformatics research. Meanwhile, the presence of many pipeline frameworks makes it possible to analyse these data. However, these tools concentrate mainly on their syntax and design paradigms, and dispatch jobs based on users' experience about the resources needed by the execution of a certain step in a protocol. As a result, it is difficult for these tools to maximize the potential of computing resources, and avoid errors caused by overload, such as memory overflow.
Here, we have developed BioQueue, a web-based framework that contains a checkpoint before each step to automatically estimate the system resources (CPU, memory and disk) needed by the step and then dispatch jobs accordingly. BioQueue possesses a shell command-like syntax instead of implementing a new script language, which means most biologists without computer programming background can access the efficient queue system with ease.
BioQueue is freely available at https://github.com/liyao001/BioQueue. The extensive documentation can be found at http://bioqueue.readthedocs.io.
li_yao@outlook.com or gcsui@nefu.edu.cn.
Supplementary data are available at Bioinformatics online.
随着下一代测序技术的快速发展,现在有大量的数据可用于生物信息学研究。同时,许多流水线框架的存在使得分析这些数据成为可能。然而,这些工具主要集中在它们的语法和设计范式上,并根据用户对执行协议中某一步骤所需资源的经验来调度作业。因此,这些工具很难最大限度地利用计算资源,并避免因过载(如内存溢出)而导致的错误。
在这里,我们开发了一个基于网络的框架 BioQueue,该框架在每个步骤之前都有一个检查点,可自动估算步骤所需的系统资源(CPU、内存和磁盘),然后相应地调度作业。BioQueue 采用类似于 shell 命令的语法,而不是实现新的脚本语言,这意味着大多数没有计算机编程背景的生物学家都可以轻松访问高效的队列系统。
BioQueue 可在 https://github.com/liyao001/BioQueue 上免费获得。广泛的文档可在 http://bioqueue.readthedocs.io 上找到。
li_yao@outlook.com 或 gcsui@nefu.edu.cn。
补充数据可在 Bioinformatics 在线获得。