Suppr超能文献

Hadoop-MCC:使用Hadoop的高效多化合物比较算法

Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop.

作者信息

Hua Guan-Jie, Hung Che-Lun, Tang Chuan Yi

机构信息

Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.

Department of Computer Science and Communication Engineering, Providence University, Taichung, Taiwan.

出版信息

Comb Chem High Throughput Screen. 2018;21(2):84-92. doi: 10.2174/1386207321666180102120641.

Abstract

AIM AND OBJECTIVE

In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently.

MATERIALS AND METHODS

Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems.

RESULTS

A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device.

CONCLUSION

Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device.

摘要

目的与目标

在过去十年中,药物设计技术有了巨大进步。计算机辅助药物设计(CADD)在药物研发的分析和预测中发挥了重要作用,使这一过程更加经济高效。然而,处理大数据的计算,如包含超过6000万种化合物数据的ZINC和拥有超过9.3亿个小分子的GDB - 13,是一个明显的耗时问题。因此,我们提出了一种名为Hadoop - MCC的新型异构高性能计算方法,它集成了Hadoop和GPU,以高效处理大量化学结构数据。

材料与方法

Hadoop - MCC从Hadoop获得高可用性和容错能力,因为Hadoop用于将输入数据分散到GPU设备并收集来自GPU设备的结果。Hadoop框架采用映射器/归约器计算模型。在所提出的方法中,映射器负责获取SMILES数据段并在GPU上执行LINGO方法,然后归约器收集映射器产生的所有比较结果。由于Hadoop的高可用性,即使一些映射器遇到问题,映射器上的所有LINGO计算任务也能完成。

结果

在每个GPU设备上并行进行LINGO比较。根据实验结果,所提出的在多个GPU设备上的方法比在单个GPU设备上的CUDA - MCC能实现更好的计算性能。

结论

Hadoop - MCC能够实现Hadoop所赋予的可扩展性、高可用性和容错能力,并且通过整合Hadoop和GPU的计算能力还能实现高性能。结果表明,有效使用像Hadoop - MCC这样的异构架构能比在单个GPU设备上提升更好的计算性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验