College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.
Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, Hunan, China.
BMC Bioinformatics. 2022 May 30;23(1):199. doi: 10.1186/s12859-022-04747-2.
The accurate characterization of protein functions is critical to understanding life at the molecular level and has a huge impact on biomedicine and pharmaceuticals. Computationally predicting protein function has been studied in the past decades. Plagued by noise and errors in protein-protein interaction (PPI) networks, researchers have undertaken to focus on the fusion of multi-omics data in recent years. A data model that appropriately integrates network topologies with biological data and preserves their intrinsic characteristics is still a bottleneck and an aspirational goal for protein function prediction.
In this paper, we propose the RWRT (Random Walks with Restart on Tensor) method to accomplish protein function prediction by applying bi-random walks on the tensor. RWRT firstly constructs a functional similarity tensor by combining protein interaction networks with multi-omics data derived from domain annotation and protein complex information. After this, RWRT extends the bi-random walks algorithm from a two-dimensional matrix to the tensor for scoring functional similarity between proteins. Finally, RWRT filters out possible pretenders based on the concept of cohesiveness coefficient and annotates target proteins with functions of the remaining functional partners. Experimental results indicate that RWRT performs significantly better than the state-of-the-art methods and improves the area under the receiver-operating curve (AUROC) by no less than 18%.
The functional similarity tensor offers us an alternative, in that it is a collection of networks sharing the same nodes; however, the edges belong to different categories or represent interactions of different nature. We demonstrate that the tensor-based random walk model can not only discover more partners with similar functions but also free from the constraints of errors in protein interaction networks effectively. We believe that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information on protein correlations.
准确描述蛋白质功能对于理解分子水平的生命至关重要,对生物医学和制药领域有巨大影响。在过去几十年中,研究人员一直在研究计算预测蛋白质功能。由于蛋白质-蛋白质相互作用(PPI)网络中的噪声和错误,近年来,研究人员已着手专注于融合多组学数据。适当整合网络拓扑结构和生物数据并保留其内在特征的数据模型仍然是蛋白质功能预测的瓶颈和理想目标。
在本文中,我们提出了 RWRT(张量上的带重启动随机游走)方法,通过在张量上应用双随机游走来完成蛋白质功能预测。RWRT 首先通过将蛋白质相互作用网络与源自域注释和蛋白质复合物信息的多组学数据相结合,构建功能相似性张量。然后,RWRT 将双随机游走算法从二维矩阵扩展到张量,以对蛋白质之间的功能相似性进行评分。最后,RWRT 根据内聚系数的概念过滤掉可能的冒充者,并根据剩余功能伙伴的功能对目标蛋白质进行注释。实验结果表明,RWRT 明显优于最先进的方法,并且将接收者操作特征曲线下的面积(AUROC)提高了至少 18%。
功能相似性张量提供了一种替代方法,因为它是一个共享相同节点的网络集合;但是,边属于不同的类别或代表不同性质的相互作用。我们证明,基于张量的随机游走模型不仅可以发现更多具有相似功能的伙伴,而且还可以有效地摆脱蛋白质相互作用网络中错误的限制。我们相信,功能预测的性能在很大程度上取决于我们是否能够提取和利用蛋白质相关性上的适当功能相似信息。