College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.
BMC Genomics. 2019 Aug 7;20(1):637. doi: 10.1186/s12864-019-5956-y.
The detection of protein complexes is of great significance for researching mechanisms underlying complex diseases and developing new drugs. Thus, various computational algorithms have been proposed for protein complex detection. However, most of these methods are based on only topological information and are sensitive to the reliability of interactions. As a result, their performance is affected by false-positive interactions in PPINs. Moreover, these methods consider only density and modularity and ignore protein complexes with various densities and modularities.
To address these challenges, we propose an algorithm to exploit protein complexes in PPINs by a Seed-Extended algorithm based on Density and Modularity with Topological structure and GO annotations, named SE-DMTG to improve the accuracy of protein complex detection. First, we use common neighbors and GO annotations to construct a weighted PPIN. Second, we define a new seed selection strategy to select seed nodes. Third, we design a new fitness function to detect protein complexes with various densities and modularities. We compare the performance of SE-DMTG with that of thirteen state-of-the-art algorithms on several real datasets.
The experimental results show that SE-DMTG not only outperforms some classical algorithms in yeast PPINs in terms of the F-measure and Jaccard but also achieves an ideal performance in terms of functional enrichment. Furthermore, we apply SE-DMTG to PPINs of several other species and demonstrate the outstanding accuracy and matching ratio in detecting protein complexes compared with other algorithms.
蛋白质复合物的检测对于研究复杂疾病的机制和开发新药具有重要意义。因此,已经提出了各种计算算法来进行蛋白质复合物检测。然而,这些方法中的大多数都仅基于拓扑信息,并且对相互作用的可靠性很敏感。因此,它们的性能受到 PPIN 中假阳性相互作用的影响。此外,这些方法仅考虑密度和模块性,而忽略了具有各种密度和模块性的蛋白质复合物。
为了解决这些挑战,我们提出了一种基于拓扑结构和 GO 注释的密度和模块性与拓扑结构的种子扩展算法(SE-DMTG),以提高蛋白质复合物检测的准确性。首先,我们使用共同邻居和 GO 注释构建一个加权 PPIN。其次,我们定义了一个新的种子选择策略来选择种子节点。第三,我们设计了一个新的适应度函数来检测具有各种密度和模块性的蛋白质复合物。我们将 SE-DMTG 的性能与十三种最先进的算法在几个真实数据集上的性能进行了比较。
实验结果表明,SE-DMTG 不仅在酵母 PPIN 中的 F 度量和 Jaccard 方面优于一些经典算法,而且在功能富集方面也表现出了理想的性能。此外,我们将 SE-DMTG 应用于其他几种物种的 PPIN,并证明了在检测蛋白质复合物方面具有出色的准确性和匹配率,优于其他算法。