Jeong Hoyeon, Kim Yoonbee, Jung Yi-Sue, Kang Dae Ryong, Cho Young-Rae
Department of Biostatistics, Wonju College of Medicine, Yonsei University, Wonju-si 26426, Gangwon-do, Korea.
National Health Big Data Clinical Research Institute, Wonju College of Medicine, Yonsei University, Wonju-si 26426, Gangwon-do, Korea.
Entropy (Basel). 2021 Sep 28;23(10):1271. doi: 10.3390/e23101271.
Functional modules can be predicted using genome-wide protein-protein interactions (PPIs) from a systematic perspective. Various graph clustering algorithms have been applied to PPI networks for this task. In particular, the detection of overlapping clusters is necessary because a protein is involved in multiple functions under different conditions. graph entropy (GE) is a novel metric to assess the quality of clusters in a large, complex network. In this study, the unweighted and weighted GE algorithm is evaluated to prove the validity of predicting function modules. To measure clustering accuracy, the clustering results are compared to protein complexes and Gene Ontology (GO) annotations as references. We demonstrate that the GE algorithm is more accurate in overlapping clusters than the other competitive methods. Moreover, we confirm the biological feasibility of the proteins that occur most frequently in the set of identified clusters. Finally, novel proteins for the additional annotation of GO terms are revealed.
从系统的角度来看,可以使用全基因组蛋白质-蛋白质相互作用(PPI)来预测功能模块。各种图聚类算法已应用于PPI网络以完成此任务。特别是,由于蛋白质在不同条件下参与多种功能,因此检测重叠簇是必要的。图熵(GE)是一种用于评估大型复杂网络中簇质量的新指标。在本研究中,对未加权和加权GE算法进行了评估,以证明预测功能模块的有效性。为了衡量聚类准确性,将聚类结果与蛋白质复合物和基因本体(GO)注释作为参考进行比较。我们证明,GE算法在重叠簇中比其他竞争方法更准确。此外,我们证实了在已识别簇集中最频繁出现的蛋白质的生物学可行性。最后,揭示了用于GO术语额外注释的新蛋白质。