Omranian Sara, Angeleska Angela, Nikoloski Zoran
Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany.
Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany.
Comput Struct Biotechnol J. 2021 Sep 20;19:5255-5263. doi: 10.1016/j.csbj.2021.09.014. eCollection 2021.
Identification of protein complexes from protein-protein interaction (PPI) networks is a key problem in PPI mining, solved by parameter-dependent approaches that suffer from small recall rates. Here we introduce GCC-v, a family of efficient, parameter-free algorithms to accurately predict protein complexes using the (weighted) clustering coefficient of proteins in PPI networks. Through comparative analyses with gold standards and PPI networks from , , and , we demonstrate that GCC-v outperforms twelve state-of-the-art approaches for identification of protein complexes with respect to twelve performance measures in at least 85.71% of scenarios. We also show that GCC-v results in the exact recovery of ∼35% of protein complexes in a pan-plant PPI network and discover 144 new protein complexes in , with high support from GO semantic similarity. Our results indicate that findings from GCC-v are robust to network perturbations, which has direct implications to assess the impact of the PPI network quality on the predicted protein complexes.
从蛋白质-蛋白质相互作用(PPI)网络中识别蛋白质复合物是PPI挖掘中的一个关键问题,目前通过依赖参数的方法解决,但这些方法召回率较低。在此,我们引入了GCC-v,这是一类高效、无参数的算法,用于利用PPI网络中蛋白质的(加权)聚类系数准确预测蛋白质复合物。通过与金标准以及来自[具体来源1]、[具体来源2]和[具体来源3]的PPI网络进行比较分析,我们证明,在至少85.71%的情况下,就十二种性能指标而言,GCC-v在识别蛋白质复合物方面优于十二种最先进的方法。我们还表明,GCC-v能够在泛植物PPI网络中准确找回约35%的蛋白质复合物,并在[具体研究对象]中发现了144个新的蛋白质复合物,且得到了基因本体(GO)语义相似性的高度支持。我们的结果表明,GCC-v的发现对网络扰动具有鲁棒性,这对于评估PPI网络质量对预测的蛋白质复合物的影响具有直接意义。