Hu Sai, Zhao Bihai
School of Mathematics, Changsha University, Changsha, 410022, Hunan, China.
School of Computer Science and Engineering, Changsha University, Changsha, 410022, Hunan, China.
Sci Rep. 2025 May 31;15(1):19131. doi: 10.1038/s41598-025-04933-1.
Protein function prediction is a fundamental cornerstone in bioinformatics, providing critical insights into biological processes and disease mechanisms. Despite significant advances, challenges persist due to data sparsity and functional ambiguity. We introduce GOHPro (GO Similarity-based Heterogeneous Network Propagation), a novel method that constructs a heterogeneous network by integrating protein functional similarity (derived from domain profiles and modular complexes) with GO semantic relationships. This method applies a network propagation algorithm to prioritize annotations based on multi-omics context. When evaluated on yeast and human datasets, GOHPro outperformed six state-of-the-art methods. Specifically, it achieved F improvements ranging from 6.8 to 47.5% over methods like exp2GO across the Biological Process (BP), Molecular Function (MF), and Cellular Component (CC) ontologies in both yeast and human species. Rigorous case studies on proteins with shared domains, such as AAA + ATPases, demonstrated GOHPro's ability to resolve functional ambiguity by leveraging contextual interactions and modular complexes. Further validation on the CAFA3 benchmark confirmed its generalizability, with F gains exceeding 62% compared to baseline approaches in human species. Our analysis revealed that homology and network connectivity critically influence prediction robustness, with the modular similarity network compensating for evolutionary gaps in dark proteins. The framework's extensibility to de novo structural predictions highlights its potential to bridge the annotation gap in uncharacterized proteomes.
蛋白质功能预测是生物信息学的一个基本基石,为深入了解生物过程和疾病机制提供了关键见解。尽管取得了重大进展,但由于数据稀疏性和功能模糊性,挑战依然存在。我们引入了GOHPro(基于GO相似性的异质网络传播),这是一种通过将蛋白质功能相似性(源自结构域概况和模块化复合物)与GO语义关系相结合来构建异质网络的新方法。该方法应用网络传播算法,根据多组学背景对注释进行优先级排序。在酵母和人类数据集上进行评估时,GOHPro优于六种先进方法。具体而言,在酵母和人类物种的生物过程(BP)、分子功能(MF)和细胞成分(CC)本体中,它比exp2GO等方法在F值上有6.8%至47.5%的提升。对具有共享结构域的蛋白质(如AAA + ATP酶)进行的严格案例研究表明,GOHPro能够通过利用上下文相互作用和模块化复合物来解决功能模糊性问题。在CAFA3基准上的进一步验证证实了其通用性,在人类物种中,与基线方法相比,F值提升超过62%。我们的分析表明,同源性和网络连通性对预测稳健性有至关重要的影响,模块化相似性网络弥补了暗蛋白中的进化差距。该框架对从头结构预测的可扩展性突出了其弥合未表征蛋白质组注释差距的潜力。