Peng Wei, Wang Jianxin, Cai Juan, Chen Lu, Li Min, Wu Fang-Xiang
School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, PR China.
BMC Syst Biol. 2014 Mar 24;8:35. doi: 10.1186/1752-0509-8-35.
Characterization of unknown proteins through computational approaches is one of the most challenging problems in silico biology, which has attracted world-wide interests and great efforts. There have been some computational methods proposed to address this problem, which are either based on homology mapping or in the context of protein interaction networks.
In this paper, two algorithms are proposed by integrating the protein-protein interaction (PPI) network, proteins' domain information and protein complexes. The one is domain combination similarity (DCS), which combines the domain compositions of both proteins and their neighbors. The other is domain combination similarity in context of protein complexes (DSCP), which extends the protein functional similarity definition of DCS by combining the domain compositions of both proteins and the complexes including them. The new algorithms are tested on networks of the model species of Saccharomyces cerevisiae to predict functions of unknown proteins using cross validations. Comparing with other several existing algorithms, the results have demonstrated the effectiveness of our proposed methods in protein function prediction. Furthermore, the algorithm DSCP using experimental determined complex data is robust when a large percentage of the proteins in the network is unknown, and it outperforms DCS and other several existing algorithms.
The accuracy of predicting protein function can be improved by integrating the protein-protein interaction (PPI) network, proteins' domain information and protein complexes.
通过计算方法对未知蛋白质进行特征描述是计算机生物学中最具挑战性的问题之一,已引起全球关注并投入大量精力。已经提出了一些计算方法来解决这个问题,这些方法要么基于同源映射,要么基于蛋白质相互作用网络。
本文通过整合蛋白质-蛋白质相互作用(PPI)网络、蛋白质的结构域信息和蛋白质复合物,提出了两种算法。一种是结构域组合相似性(DCS),它结合了蛋白质及其邻居的结构域组成。另一种是蛋白质复合物背景下的结构域组合相似性(DSCP),它通过结合蛋白质及其所在复合物的结构域组成,扩展了DCS的蛋白质功能相似性定义。新算法在酿酒酵母模型物种的网络上进行测试,使用交叉验证来预测未知蛋白质的功能。与其他几种现有算法相比,结果证明了我们提出的方法在蛋白质功能预测中的有效性。此外,使用实验确定的复合物数据的DSCP算法在网络中很大比例的蛋白质未知时具有鲁棒性,并且优于DCS和其他几种现有算法。
通过整合蛋白质-蛋白质相互作用(PPI)网络、蛋白质的结构域信息和蛋白质复合物,可以提高蛋白质功能预测的准确性。