Xu Bo, Li Changlong, Zhuang Hang, Wang Jiali, Wang Qingfeng, Wang Chao, Zhou Xuehai
School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, China.
BMC Med Genomics. 2018 Nov 20;11(Suppl 5):100. doi: 10.1186/s12920-018-0415-1.
The clinical decision support system can effectively break the limitations of doctors' knowledge and reduce the possibility of misdiagnosis to enhance health care. The traditional genetic data storage and analysis methods based on stand-alone environment are hard to meet the computational requirements with the rapid genetic data growth for the limited scalability.
In this paper, we propose a distributed gene clinical decision support system, which is named GCDSS. And a prototype is implemented based on cloud computing technology. At the same time, we present CloudBWA which is a novel distributed read mapping algorithm leveraging batch processing strategy to map reads on Apache Spark.
Experiments show that the distributed gene clinical decision support system GCDSS and the distributed read mapping algorithm CloudBWA have outstanding performance and excellent scalability. Compared with state-of-the-art distributed algorithms, CloudBWA achieves up to 2.63 times speedup over SparkBWA. Compared with stand-alone algorithms, CloudBWA with 16 cores achieves up to 11.59 times speedup over BWA-MEM with 1 core.
GCDSS is a distributed gene clinical decision support system based on cloud computing techniques. In particular, we incorporated a distributed genetic data analysis pipeline framework in the proposed GCDSS system. To boost the data processing of GCDSS, we propose CloudBWA, which is a novel distributed read mapping algorithm to leverage batch processing technique in mapping stage using Apache Spark platform.
临床决策支持系统能够有效突破医生知识的局限,降低误诊可能性,从而改善医疗保健。基于单机环境的传统基因数据存储和分析方法,难以满足因基因数据快速增长而对扩展性要求有限的计算需求。
本文提出了一种分布式基因临床决策支持系统,名为GCDSS。并基于云计算技术实现了一个原型。同时,我们提出了CloudBWA,这是一种新颖的分布式读段比对算法,利用批处理策略在Apache Spark上进行读段比对。
实验表明,分布式基因临床决策支持系统GCDSS和分布式读段比对算法CloudBWA具有出色的性能和卓越的扩展性。与最先进的分布式算法相比,CloudBWA比SparkBWA的加速比高达2.63倍。与单机算法相比,具有16个核心的CloudBWA比具有1个核心的BWA-MEM的加速比高达11.59倍。
GCDSS是一个基于云计算技术的分布式基因临床决策支持系统。特别是,我们在所提出的GCDSS系统中纳入了一个分布式遗传数据分析管道框架。为了提高GCDSS的数据处理能力,我们提出了CloudBWA,这是一种新颖的分布式读段比对算法,利用批处理技术在映射阶段使用Apache Spark平台。