School of Computer Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane, Australia.
Department of Information Technology and Communications, Azarbaijan Shahid Madani University, Tabriz, Iran.
Comput Biol Med. 2021 Nov;138:104933. doi: 10.1016/j.compbiomed.2021.104933. Epub 2021 Oct 8.
The identification of protein complexes in protein-protein interaction networks is the most fundamental and essential problem for revealing the underlying mechanism of biological processes. However, most existing protein complexes identification methods only consider a network's topology structures, and in doing so, these methods miss the advantage of using nodes' feature information. In protein-protein interaction, both topological structure and node features are essential ingredients for protein complexes. The spectral clustering method utilizes the eigenvalues of the affinity matrix of the data to map to a low-dimensional space. It has attracted much attention in recent years as one of the most efficient algorithms in the subcategory of dimensionality reduction. In this paper, a new version of spectral clustering, named text-associated DeepWalk-Spectral Clustering (TADW-SC), is proposed for attributed networks in which the identified protein complexes have structural cohesiveness and attribute homogeneity. Since the performance of spectral clustering heavily depends on the effectiveness of the affinity matrix, our proposed method will use the text-associated DeepWalk (TADW) to calculate the embedding vectors of proteins. In the following, the affinity matrix will be computed by utilizing the cosine similarity between the two low dimensional vectors, which will be considerable to improve the accuracy of the affinity matrix. Experimental results show that our method performs unexpectedly well in comparison to existing state-of-the-art methods in both real protein network datasets and synthetic networks.
在蛋白质-蛋白质相互作用网络中鉴定蛋白质复合物是揭示生物过程潜在机制的最基本和最关键的问题。然而,大多数现有的蛋白质复合物识别方法仅考虑网络的拓扑结构,而忽略了利用节点特征信息的优势。在蛋白质-蛋白质相互作用中,拓扑结构和节点特征都是蛋白质复合物的重要组成部分。谱聚类方法利用数据的相似性矩阵的特征值映射到低维空间。近年来,作为降维子类别中最有效的算法之一,它引起了广泛的关注。在本文中,提出了一种新的谱聚类版本,称为带有属性的 DeepWalk-谱聚类(TADW-SC),用于具有结构内聚性和属性同质性的属性网络。由于谱聚类的性能严重依赖于相似性矩阵的有效性,因此我们提出的方法将使用带有属性的 DeepWalk(TADW)来计算蛋白质的嵌入向量。接下来,将利用两个低维向量之间的余弦相似度来计算相似性矩阵,这将有助于提高相似性矩阵的准确性。实验结果表明,与真实蛋白质网络数据集和合成网络中的现有最先进方法相比,我们的方法表现出色。