Mao Yu, Xu WenHui, Shun Yue, Chai LongXin, Xue Lei, Yang Yong, Li Mei
State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan, 430062, Hubei, China.
Sci Rep. 2025 Mar 26;15(1):10465. doi: 10.1038/s41598-025-94612-y.
Protein function, which is determined by sequence, structure, and other characteristics, plays a crucial role in an organism's performance. Existing protein function prediction methods mainly rely on sequence data and often ignore structural properties that are crucial for accurate prediction. Protein structure provides richer spatial and functional insights, which can significantly improve prediction accuracy. In this work, we propose a multi-modal protein function prediction model (MMPFP) that integrates protein sequence and structure information through the use of GCN, CNN, and Transformer models. We validate the model using the PDBest dataset, demonstrating that MMPFP outperforms traditional single-modal models in the molecular function (MF), biological process (BP), and cellular component (CC) prediction tasks. Specifically, MMPFP achieved AUPR scores of 0.693, 0.355, and 0.478; [Formula: see text] scores of 0.752, 0.629, and 0.691; and [Formula: see text] scores of 0.336, 0.488, and 0.459, showing a 3-5% improvement over single-modal models. Additionally, ablation studies confirm the effectiveness of the Transformer module within the GCN branch, further validating MMPFP's superior performance over existing methods. This multi-modal approach offers a more accurate and comprehensive framework for protein function prediction, addressing key limitations of current models.
由序列、结构和其他特征决定的蛋白质功能在生物体的表现中起着至关重要的作用。现有的蛋白质功能预测方法主要依赖序列数据,并且常常忽略对准确预测至关重要的结构特性。蛋白质结构提供了更丰富的空间和功能见解,这可以显著提高预测准确性。在这项工作中,我们提出了一种多模态蛋白质功能预测模型(MMPFP),该模型通过使用GCN、CNN和Transformer模型整合蛋白质序列和结构信息。我们使用PDBest数据集对模型进行了验证,证明MMPFP在分子功能(MF)、生物过程(BP)和细胞成分(CC)预测任务中优于传统的单模态模型。具体而言,MMPFP在MF、BP和CC任务上的AUPR分数分别为0.693、0.355和0.478;[公式:见原文]分数分别为0.752、0.629和0.691;[公式:见原文]分数分别为0.336、0.488和0.459,比单模态模型提高了3-5%。此外,消融研究证实了GCN分支中Transformer模块的有效性,进一步验证了MMPFP相对于现有方法的优越性能。这种多模态方法为蛋白质功能预测提供了一个更准确、更全面的框架,解决了当前模型的关键局限性。