Yu Yang, Liu Jie, Feng Nuan, Song Bo, Zheng Zeyu
Software College, Shenyang Normal University, Shenyang 110034, PR China; Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, PR China.
Software College, Shenyang Normal University, Shenyang 110034, PR China.
J Theor Biol. 2017 Jan 7;412:107-112. doi: 10.1016/j.jtbi.2016.10.010. Epub 2016 Oct 29.
Studies of protein modules in a Protein-Protein Interaction (PPI) network contribute greatly to the understanding of biological mechanisms. With the development of computing science, computational approaches have played an important role in locating protein modules. In this paper, a new approach combining Gene Ontology and amino acid background frequency is introduced to detect the protein modules in the weighted PPI networks. The proposed approach mainly consists of three parts: the feature extraction, the weighted graph construction and the protein complex detection. Firstly, the topology-sequence information is utilized to present the feature of protein complex. Secondly, six types of the weighed graph are constructed by combining PPI network and Gene Ontology information. Lastly, protein complex algorithm is applied to the weighted graph, which locates the clusters based on three conditions, including density, network diameter and the included angle cosine. Experiments have been conducted on two protein complex benchmark sets for yeast and the results show that the approach is more effective compared to five typical algorithms with the performance of f-measure and precision. The combination of protein interaction network with sequence and gene ontology data is helpful to improve the performance and provide a optional method for protein module detection.
对蛋白质-蛋白质相互作用(PPI)网络中的蛋白质模块进行研究,对理解生物学机制有很大帮助。随着计算机科学的发展,计算方法在定位蛋白质模块方面发挥了重要作用。本文介绍了一种结合基因本体论和氨基酸背景频率的新方法,用于检测加权PPI网络中的蛋白质模块。所提出的方法主要由三部分组成:特征提取、加权图构建和蛋白质复合物检测。首先,利用拓扑序列信息来呈现蛋白质复合物的特征。其次,通过结合PPI网络和基因本体论信息构建六种类型的加权图。最后,将蛋白质复合物算法应用于加权图,该算法基于密度、网络直径和夹角余弦这三个条件来定位聚类。已在酵母的两个蛋白质复合物基准集上进行了实验,结果表明,与五种典型算法相比,该方法在F值和精度性能方面更有效。蛋白质相互作用网络与序列和基因本体数据的结合有助于提高性能,并为蛋白质模块检测提供了一种可选方法。