Lubovac Zelmina, Gamalielsson Jonas, Olsson Björn
School of Humanities and Informatics, University of Skövde, Skövde, Sweden.
Proteins. 2006 Sep 1;64(4):948-59. doi: 10.1002/prot.21071.
Advances in large-scale technologies in proteomics, such as yeast two-hybrid screening and mass spectrometry, have made it possible to generate large Protein Interaction Networks (PINs). Recent methods for identifying dense sub-graphs in such networks have been based solely on graph theoretic properties. Therefore, there is a need for an approach that will allow us to combine domain-specific knowledge with topological properties to generate functionally relevant sub-graphs from large networks. This article describes two alternative network measures for analysis of PINs, which combine functional information with topological properties of the networks. These measures, called weighted clustering coefficient and weighted average nearest-neighbors degree, use weights representing the strengths of interactions between the proteins, calculated according to their semantic similarity, which is based on the Gene Ontology terms of the proteins. We perform a global analysis of the yeast PIN by systematically comparing the weighted measures with their topological counterparts. To show the usefulness of the weighted measures, we develop an algorithm for identification of functional modules, called SWEMODE (Semantic WEights for MODule Elucidation), that identifies dense sub-graphs containing functionally similar proteins. The proposed method is based on the ranking of nodes, i.e., proteins, according to their weighted neighborhood cohesiveness. The highest ranked nodes are considered as seeds for candidate modules. The algorithm then iterates through the neighborhood of each seed protein, to identify densely connected proteins with high functional similarity, according to the chosen parameters. Using a yeast two-hybrid data set of experimentally determined protein-protein interactions, we demonstrate that SWEMODE is able to identify dense clusters containing proteins that are functionally similar. Many of the identified modules correspond to known complexes or subunits of these complexes.
蛋白质组学中大规模技术的进步,如酵母双杂交筛选和质谱分析,使得生成大型蛋白质相互作用网络(PINs)成为可能。最近在这类网络中识别密集子图的方法仅仅基于图论性质。因此,需要一种方法,使我们能够将特定领域的知识与拓扑性质相结合,从大型网络中生成功能相关的子图。本文描述了两种用于分析PINs的替代网络度量方法,它们将功能信息与网络的拓扑性质相结合。这些度量方法,称为加权聚类系数和加权平均最近邻度,使用根据蛋白质之间相互作用强度计算的权重,该权重根据它们的语义相似性计算,而语义相似性基于蛋白质的基因本体术语。我们通过系统地将加权度量与其拓扑对应度量进行比较,对酵母PIN进行了全局分析。为了展示加权度量的有用性,我们开发了一种用于识别功能模块的算法,称为SWEMODE(用于模块阐释的语义权重),该算法可识别包含功能相似蛋白质的密集子图。所提出的方法基于节点(即蛋白质)根据其加权邻域凝聚性的排序。排名最高的节点被视为候选模块的种子。然后,该算法遍历每个种子蛋白质的邻域,根据所选参数识别具有高功能相似性的紧密连接蛋白质。使用一个通过实验确定蛋白质 - 蛋白质相互作用的酵母双杂交数据集,我们证明SWEMODE能够识别包含功能相似蛋白质的密集簇。许多识别出的模块对应于已知的复合物或这些复合物的亚基。