基于多阶段注意力机制的蛋白质序列与结构特征提取及融合用于蛋白质功能预测

Multi-stage attention-based extraction and fusion of protein sequence and structural features for protein function prediction.

作者信息

Liu Meiling, Wang Shuangshuang, Luo Zeyu, Wang Guohua, Zhao Yuming

机构信息

College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China.

出版信息

Bioinformatics. 2025 Jun 26. doi: 10.1093/bioinformatics/btaf374.

DOI:10.1093/bioinformatics/btaf374

PMID:40569190

Abstract

MOTIVATION

Protein function prediction is important for drug development and disease treatment. Recently, deep learning methods have leveraged protein sequence and structural information, achieving remarkable progress in the field of protein function prediction. However, existing methods ignore the complex multimodal interaction information between sequence and structural features. Since protein sequence and structural information reveal the functional characteristics of proteins from different perspectives, it is challenging to effectively fuse the information from these two modalities to portray protein functions more comprehensively. In addition, current methods have difficulty in effectively capturing long-range dependencies and global contextual information in protein sequences during feature extraction, thus limiting the ability of the model to recognize critical functional residues.

RESULTS

In this study, we propose a novel framework termed MAEF-GO based on a multi-stage attention mechanism to predict protein functions. MAEF-GO innovatively integrates the Graph Convolutional Network (GCN) and the Graph Attention Network (GAT) to extract protein structural features. To address the issue of modeling long-range dependencies within protein sequences, we introduce a frequency-domain attention mechanism capable of extracting global contextual relationships. Additionally, a cross-attention module is implemented to facilitate interactive fusion between protein sequence and structural modalities. Experimental evaluations demonstrate that MAEF-GO achieves superior performance compared to several state-of-the-art baseline models across standard benchmarks. Furthermore, analysis of the cross-attention weight distributions demonstrates MAEF-GO's interpretability. It can effectively identify critical functional residues of proteins.

AVAILABILITY

The MAEF-GO source code can be found at https://github.com/nebstudio/MAEF-GO, An archived snapshot of the code used in this study is also available via Zenodo at https://doi.org/10.5281/zenodo.15422392.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质功能预测对于药物开发和疾病治疗至关重要。最近，深度学习方法利用蛋白质序列和结构信息，在蛋白质功能预测领域取得了显著进展。然而，现有方法忽略了序列和结构特征之间复杂的多模态交互信息。由于蛋白质序列和结构信息从不同角度揭示了蛋白质的功能特征，有效融合这两种模态的信息以更全面地描绘蛋白质功能具有挑战性。此外，当前方法在特征提取过程中难以有效捕捉蛋白质序列中的长程依赖性和全局上下文信息，从而限制了模型识别关键功能残基的能力。

结果

在本研究中，我们提出了一种基于多阶段注意力机制的名为MAEF-GO的新型框架来预测蛋白质功能。MAEF-GO创新性地整合了图卷积网络（GCN）和图注意力网络（GAT）以提取蛋白质结构特征。为了解决蛋白质序列中长程依赖性建模的问题，我们引入了一种能够提取全局上下文关系的频域注意力机制。此外，还实现了一个交叉注意力模块以促进蛋白质序列和结构模态之间的交互融合。实验评估表明，在标准基准测试中，MAEF-GO与几个最先进的基线模型相比具有卓越的性能。此外，对交叉注意力权重分布的分析证明了MAEF-GO的可解释性。它可以有效地识别蛋白质的关键功能残基。