使用MapReduce高效预测大规模蛋白质-蛋白质相互作用。

Efficiently predicting large-scale protein-protein interactions using MapReduce.

作者信息

Hu Lun, Yuan Xiaohui, Hu Pengwei, Chan Keith C C

机构信息

School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China.

出版信息

Comput Biol Chem. 2017 Aug;69:202-206. doi: 10.1016/j.compbiolchem.2017.03.009. Epub 2017 Apr 1.

DOI:10.1016/j.compbiolchem.2017.03.009

PMID:28396055

Abstract

With a rapid development of high-throughput genomic technologies, a vast amount of protein-protein interactions (PPIs) data has been generated for difference species. However, such set of PPIs is rather small when compared with all possible PPIs. Hence, there is a necessity to specifically develop computational algorithms for large-scale PPI prediction. In response to this need, we propose a parallel algorithm, namely pVLASPD, to perform the prediction task in a distributed manner. In particular, pVLASPD was modified based on the VLASPD algorithm for the purpose of improving the efficiency of VLASPD while maintaining a comparable effectiveness. To do so, we first analyzed VLASPD step by step to identify the places that caused the bottlenecks of efficiency. After that, pVLASPD was developed by parallelizing those inefficient places with the framework of MapReduce. The extensive experimental results demonstrate the promising performance of pVLASPD when applied to prediction of large-scale PPIs.

摘要

随着高通量基因组技术的快速发展，已针对不同物种生成了大量蛋白质-蛋白质相互作用（PPI）数据。然而，与所有可能的PPI相比，这样一组PPI相当少。因此，有必要专门开发用于大规模PPI预测的计算算法。为了满足这一需求，我们提出了一种并行算法，即pVLASPD，以分布式方式执行预测任务。特别是，pVLASPD基于VLASPD算法进行了修改，目的是在保持相当有效性的同时提高VLASPD的效率。为此，我们首先逐步分析VLASPD，以确定导致效率瓶颈的地方。之后，通过使用MapReduce框架将那些低效的地方并行化来开发pVLASPD。大量实验结果证明了pVLASPD应用于大规模PPI预测时的良好性能。