IEEE Trans Nanobioscience. 2018 Jul;17(3):199-208. doi: 10.1109/TNB.2018.2837122. Epub 2018 May 16.
Bioinformatics research continues to advance at an increasing scale with the help of techniques such as next-generation sequencing and the availability of tool support to automate bioinformatics processes. With this growth, a large amount of biological data gets accumulated at an unprecedented rate, demanding high-performance and high-throughput computing technologies for processing such datasets. Use of hardware accelerators, such as graphics processing units (GPUs) and distributed computing, accelerates the processing of big data in high-performance computing environments. They enable higher degrees of parallelism to be achieved, thereby increasing the throughput. In this paper, we introduce BioWorkflow, an interactive workflow management system to automate the bioinformatics analyses with the capability of scheduling parallel tasks with the use of GPU-accelerated and distributed computing. This paper describes a case study carried out to evaluate the performance of a complex workflow with branching executed by BioWorkflow. The results indicate the gains of $\times 2.89$ magnitude by utilizing GPUs and gains in speed by average $\times 2.832$ magnitude (over $n = 5$ scenarios) by parallel execution of graph nodes during multiple sequence alignment calculations. Combined speed-ups are achieved $\times 1.71$ times for complex workflows. This confirms the expected higher speed-ups when having parallelism through GPU-acceleration and concurrent execution of workflow nodes than the mainstream sequential workflow execution. The tool also provides a comprehensive user interface with better interactivity for managing complex workflows; a system usability scale score of 82.9 is confirmed high usability for the system.
生物信息学研究在下一代测序技术和自动化生物信息学流程工具支持的帮助下,继续以越来越大的规模推进。随着这种增长,大量的生物数据以前所未有的速度积累,需要高性能和高通量计算技术来处理这些数据集。硬件加速器的使用,如图形处理单元 (GPU) 和分布式计算,加速了高性能计算环境中大数据的处理。它们能够实现更高程度的并行性,从而提高吞吐量。在本文中,我们介绍了 BioWorkflow,这是一个交互式工作流管理系统,具有使用 GPU 加速和分布式计算调度并行任务的能力,可实现生物信息学分析的自动化。本文描述了一个案例研究,评估了通过 BioWorkflow 执行具有分支的复杂工作流的性能。结果表明,利用 GPU 可获得 2.89 倍的增益,通过在多个序列比对计算中并行执行图节点,平均可获得 2.832 倍的速度增益(在 5 个场景中)。对于复杂的工作流程,实现了 1.71 倍的综合加速。这证实了当通过 GPU 加速和工作流程节点的并发执行具有并行性时,比主流的顺序工作流程执行具有更高的速度提升预期。该工具还提供了一个具有更好交互性的综合用户界面,用于管理复杂的工作流程;系统可用性量表的分数为 82.9,证实了系统的高可用性。