School of Computer Science, Shaanxi Normal University, Xi'an, Shaanxi, China.
Department of Computer Science and Engineering, State University of New York at Buffalo, NY, 14260-2000, USA.
BMC Med Genomics. 2017 Dec 28;10(Suppl 5):80. doi: 10.1186/s12920-017-0314-x.
Identifying protein complexes plays an important role for understanding cellular organization and functional mechanisms. As plenty of evidences have indicated that dense sub-networks in dynamic protein-protein interaction network (DPIN) usually correspond to protein complexes, identifying protein complexes is formulated as density-based clustering.
In this paper, a new approach named iOPTICS-GSO is developed, which is the improved Ordering Points to Identify the Clustering Structure (OPTICS) algorithm with Glowworm swarm optimization algorithm (GSO) to optimize the parameters in OPTICS when finding dense sub-networks. In our iOPTICS-GSO, the concept of core node is redefined and the Euclidean distance in OPTICS is replaced with the improved similarity between the nodes in the PPI network according to their interaction strength, and dense sub-networks are considered as protein complexes.
The experiment results have shown that our iOPTICS-GSO outperforms of algorithms such as DBSCAN, CFinder, MCODE, CMC, COACH, ClusterOne MCL and OPTICS_PSO in terms of f-measure and p-value on four DPINs, which are from the DIP, Krogan, MIPS and Gavin datasets. In addition, our predicted protein complexes have a small p-value and thus are highly likely to be true protein complexes.
The proposed iOPTICS-GSO gains optimal clustering results by adopting GSO algorithm to optimize the parameters in OPTICS, and the result on four datasets shows superior performance. What's more, the results provided clues for biologists to verify and find new protein complexes.
鉴定蛋白质复合物对于理解细胞组织和功能机制起着重要作用。由于大量证据表明动态蛋白质-蛋白质相互作用网络(DPIN)中的密集子网络通常对应于蛋白质复合物,因此蛋白质复合物的鉴定被表述为基于密度的聚类。
在本文中,提出了一种名为 iOPTICS-GSO 的新方法,它是对基于密度的聚类算法(OPTICS)的改进,使用了萤火虫群优化算法(GSO)来优化 OPTICS 中发现密集子网络时的参数。在我们的 iOPTICS-GSO 中,核心节点的概念被重新定义,并且根据节点之间的相互作用强度,用节点在 PPI 网络中的改进相似度代替 OPTICS 中的欧几里得距离,密集子网络被视为蛋白质复合物。
实验结果表明,在四个 DPIN(来自 DIP、Krogan、MIPS 和 Gavin 数据集)上,与 DBSCAN、CFinder、MCODE、CMC、COACH、ClusterOne MCL 和 OPTICS_PSO 等算法相比,我们的 iOPTICS-GSO 在 f-measure 和 p 值方面表现更好。此外,我们预测的蛋白质复合物具有较小的 p 值,因此很可能是真正的蛋白质复合物。
通过采用 GSO 算法优化 OPTICS 中的参数,所提出的 iOPTICS-GSO 获得了最优的聚类结果,并且在四个数据集上的结果显示出了优异的性能。此外,结果为生物学家提供了验证和发现新蛋白质复合物的线索。