Sengupta Kaustav, Saha Sovan, Halder Anup Kumar, Chatterjee Piyali, Nasipuri Mita, Basu Subhadip, Plewczynski Dariusz
Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw, Warsaw, Poland.
Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.
Front Genet. 2022 Sep 29;13:969915. doi: 10.3389/fgene.2022.969915. eCollection 2022.
Protein function prediction is gradually emerging as an essential field in biological and computational studies. Though the latter has clinched a significant footprint, it has been observed that the application of computational information gathered from multiple sources has more significant influence than the one derived from a single source. Considering this fact, a methodology, PFP-GO, is proposed where heterogeneous sources like Protein Sequence, Protein Domain, and Protein-Protein Interaction Network have been processed separately for ranking each individual functional GO term. Based on this ranking, GO terms are propagated to the target proteins. While Protein sequence enriches the sequence-based information, Protein Domain and Protein-Protein Interaction Networks embed structural/functional and topological based information, respectively, during the phase of GO ranking. Performance analysis of PFP-GO is also based on Precision, Recall, and F-Score. The same was found to perform reasonably better when compared to the other existing state-of-art. PFP-GO has achieved an overall Precision, Recall, and F-Score of 0.67, 0.58, and 0.62, respectively. Furthermore, we check some of the top-ranked GO terms predicted by PFP-GO through multilayer network propagation that affect the 3D structure of the genome. The complete source code of PFP-GO is freely available at https://sites.google.com/view/pfp-go/.
蛋白质功能预测正逐渐成为生物学和计算研究中的一个重要领域。尽管计算研究已经取得了显著成果,但人们发现,从多个来源收集的计算信息的应用比从单一来源获得的信息具有更大的影响。考虑到这一事实,提出了一种名为PFP-GO的方法,该方法对蛋白质序列、蛋白质结构域和蛋白质-蛋白质相互作用网络等异构源进行单独处理,以对每个单独的功能基因本体(GO)术语进行排名。基于此排名,将GO术语传播到目标蛋白质。在GO排名阶段,蛋白质序列丰富了基于序列的信息,蛋白质结构域和蛋白质-蛋白质相互作用网络分别嵌入了基于结构/功能和拓扑的信息。PFP-GO的性能分析也基于精确率、召回率和F值。与其他现有最先进的方法相比,该方法表现得相当好。PFP-GO的总体精确率、召回率和F值分别达到了0.67、0.58和0.62。此外,我们通过多层网络传播检查了一些由PFP-GO预测的排名靠前的GO术语,这些术语会影响基因组的三维结构。PFP-GO的完整源代码可在https://sites.google.com/view/pfp-go/上免费获取。