School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China.
College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao, China.
BMC Bioinformatics. 2023 May 17;24(1):203. doi: 10.1186/s12859-023-05315-y.
A major current focus in the analysis of protein-protein interaction (PPI) data is how to identify essential proteins. As massive PPI data are available, this warrants the design of efficient computing methods for identifying essential proteins. Previous studies have achieved considerable performance. However, as a consequence of the features of high noise and structural complexity in PPIs, it is still a challenge to further upgrade the performance of the identification methods.
This paper proposes an identification method, named CTF, which identifies essential proteins based on edge features including h-quasi-cliques and uv-triangle graphs and the fusion of multiple-source information. We first design an edge-weight function, named EWCT, for computing the topological scores of proteins based on quasi-cliques and triangle graphs. Then, we generate an edge-weighted PPI network using EWCT and dynamic PPI data. Finally, we compute the essentiality of proteins by the fusion of topological scores and three scores of biological information.
We evaluated the performance of the CTF method by comparison with 16 other methods, such as MON, PeC, TEGS, and LBCC, the experiment results on three datasets of Saccharomyces cerevisiae show that CTF outperforms the state-of-the-art methods. Moreover, our method indicates that the fusion of other biological information is beneficial to improve the accuracy of identification.
目前,蛋白质-蛋白质相互作用(PPI)数据分析的一个主要焦点是如何识别必需蛋白质。由于大量的 PPI 数据可用,这就需要设计有效的计算方法来识别必需蛋白质。以前的研究已经取得了相当大的成果。然而,由于 PPI 中存在高噪声和结构复杂性的特点,进一步提高识别方法的性能仍然是一个挑战。
本文提出了一种识别方法,称为 CTF,它基于边特征(包括 h-拟簇和 uv-三角形图)和多源信息融合来识别必需蛋白质。我们首先设计了一种边权重函数,称为 EWCT,用于基于拟簇和三角形图计算蛋白质的拓扑分数。然后,我们使用 EWCT 和动态 PPI 数据生成一个边加权 PPI 网络。最后,我们通过融合拓扑分数和三种生物信息分数来计算蛋白质的必需性。
我们通过与其他 16 种方法(如 MON、PeC、TEGS 和 LBCC)进行比较,评估了 CTF 方法的性能,实验结果表明,在三个酿酒酵母数据集上,CTF 方法优于最先进的方法。此外,我们的方法表明融合其他生物信息有助于提高识别的准确性。