Suppr超能文献

将ESM-2和图神经网络与AlphaFold-2结构相结合以增强蛋白质功能预测

Integrating ESM‑2 and Graph Neural Networks with AlphaFold‑2 Structures for Enhanced Protein Function Prediction.

作者信息

Nguyen Thi-Tuyen, Jiang Zhuocheng, Nguyen Van-Nui, Le Nguyen Quoc Khanh, Chua Matthew Chin Heng

机构信息

University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen 25000, Viet Nam.

Department of Biomedical Informatics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228, Singapore.

出版信息

ACS Omega. 2025 Aug 16;10(33):38103-38111. doi: 10.1021/acsomega.5c05484. eCollection 2025 Aug 26.

Abstract

Protein function prediction is essential for elucidating biological processes and accelerating drug discovery. However, the vast number of unannotated protein sequences and the limited availability of experimentally validated functional data remain major challenges. Although deep learning models based on protein sequences or protein-protein interaction networks have shown promise, their performance is still restricted, particularly for proteins without interaction data. Furthermore, many existing approaches treat sequence and structural information separately, potentially resulting in suboptimal feature representations. To address these limitations, we propose an improved graph-based framework that integrates two key innovations: (i) ESM-2, a state-of-the-art protein language model, to generate semantically rich sequence embeddings; and (ii) a hybrid pooling mechanism within graph convolutional blocks to better capture both global and local structural features from AlphaFold2-predicted structures. Experiments on the human proteome demonstrate that our model consistently outperforms existing methods in predicting molecular function, cellular component, and biological process annotations. These findings highlight the advantages of combining advanced sequence representations with enhanced structural learning for accurate and generalizable protein function prediction.

摘要

蛋白质功能预测对于阐明生物过程和加速药物发现至关重要。然而,大量未注释的蛋白质序列以及实验验证的功能数据的有限可用性仍然是主要挑战。尽管基于蛋白质序列或蛋白质-蛋白质相互作用网络的深度学习模型已显示出前景,但其性能仍然受到限制,特别是对于没有相互作用数据的蛋白质。此外,许多现有方法分别处理序列和结构信息,可能导致次优的特征表示。为了解决这些限制,我们提出了一种改进的基于图的框架,该框架集成了两项关键创新:(i)ESM-2,一种先进的蛋白质语言模型,用于生成语义丰富的序列嵌入;(ii)图卷积块内的混合池化机制,以更好地从AlphaFold2预测的结构中捕获全局和局部结构特征。在人类蛋白质组上的实验表明,我们的模型在预测分子功能、细胞成分和生物过程注释方面始终优于现有方法。这些发现突出了将先进的序列表示与增强的结构学习相结合以进行准确且通用的蛋白质功能预测的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/346c/12391975/1783c7415cae/ao5c05484_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验