Department of Computer Science, Bharathiar University, Tamilnadu, India.
Sci Rep. 2019 Jul 31;9(1):11106. doi: 10.1038/s41598-019-47468-y.
The accessibility of a huge amount of protein-protein interaction (PPI) data has allowed to do research on biological networks that reveal the structure of a protein complex, pathways and its cellular organization. A key demand in computational biology is to recognize the modular structure of such biological networks. The detection of protein complexes from the PPI network, is one of the most challenging and significant problems in the post-genomic era. In Bioinformatics, the frequently employed approach for clustering the networks is Markov Clustering (MCL). Many of the researches for protein complex detection were done on the static PPI network, which suffers from a few drawbacks. To resolve this problem, this paper proposes an approach to detect the dynamic protein complexes through Markov Clustering based on Elephant Herd Optimization Approach (DMCL-EHO). Initially, the proposed method divides the PPI network into a set of dynamic subnetworks under various time points by combining the gene expression data and secondly, it employs the clustering analysis on every subnetwork using the MCL along with Elephant Herd Optimization approach. The experimental analysis was employed on different PPI network datasets and the proposed method surpasses various existing approaches in terms of accuracy measures. This paper identifies the common protein complexes that are expressively enriched in gold-standard datasets and also the pathway annotations of the detected protein complexes using the KEGG database.
大量蛋白质-蛋白质相互作用(PPI)数据的可及性使得研究生物网络成为可能,这些网络揭示了蛋白质复合物、途径及其细胞组织的结构。计算生物学的一个关键要求是识别这种生物网络的模块化结构。从 PPI 网络中检测蛋白质复合物是后基因组时代最具挑战性和重要的问题之一。在生物信息学中,经常采用的网络聚类方法是 Markov 聚类(MCL)。许多蛋白质复合物检测的研究都是基于静态 PPI 网络进行的,该网络存在一些缺点。为了解决这个问题,本文提出了一种通过基于大象群优化算法的 Markov 聚类(DMCL-EHO)来检测动态蛋白质复合物的方法。该方法首先通过结合基因表达数据,将 PPI 网络在不同时间点划分为一组动态子网,然后使用 MCL 和大象群优化算法对每个子网进行聚类分析。该方法在不同的 PPI 网络数据集上进行了实验分析,在准确性度量方面优于各种现有方法。本文还使用 KEGG 数据库识别了在金标准数据集中表达丰富的常见蛋白质复合物,以及检测到的蛋白质复合物的途径注释。