GGN-GO：基于多尺度结构特征预测蛋白质功能的几何图网络。

GGN-GO: geometric graph networks for predicting protein function by multi-scale structure features.

机构信息

The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing.

The College of Life Science and Technology, Beijing University of Chemical Technology, Beijing.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae559.

DOI:10.1093/bib/bbae559

PMID:39487084

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11530295/

Abstract

Recent advances in high-throughput sequencing have led to an explosion of genomic and transcriptomic data, offering a wealth of protein sequence information. However, the functions of most proteins remain unannotated. Traditional experimental methods for annotation of protein functions are costly and time-consuming. Current deep learning methods typically rely on Graph Convolutional Networks to propagate features between protein residues. However, these methods fail to capture fine atomic-level geometric structural features and cannot directly compute or propagate structural features (such as distances, directions, and angles) when transmitting features, often simplifying them to scalars. Additionally, difficulties in capturing long-range dependencies limit the model's ability to identify key nodes (residues). To address these challenges, we propose a geometric graph network (GGN-GO) for predicting protein function that enriches feature extraction by capturing multi-scale geometric structural features at the atomic and residue levels. We use a geometric vector perceptron to convert these features into vector representations and aggregate them with node features for better understanding and propagation in the network. Moreover, we introduce a graph attention pooling layer captures key node information by adaptively aggregating local functional motifs, while contrastive learning enhances graph representation discriminability through random noise and different views. The experimental results show that GGN-GO outperforms six comparative methods in tasks with the most labels for both experimentally validated and predicted protein structures. Furthermore, GGN-GO identifies functional residues corresponding to those experimentally confirmed, showcasing its interpretability and the ability to pinpoint key protein regions. The code and data are available at: https://github.com/MiJia-ID/GGN-GO.

摘要

高通量测序技术的最新进展导致基因组和转录组数据呈爆炸式增长，提供了丰富的蛋白质序列信息。然而，大多数蛋白质的功能仍然没有被注释。传统的蛋白质功能注释实验方法成本高、耗时长。当前的深度学习方法通常依赖图卷积网络在蛋白质残基之间传播特征。然而，这些方法无法捕捉精细的原子级几何结构特征，并且在传递特征时无法直接计算或传播结构特征（如距离、方向和角度），通常将它们简化为标量。此外，捕捉长程依赖关系的困难限制了模型识别关键节点（残基）的能力。为了解决这些挑战，我们提出了一种用于预测蛋白质功能的几何图网络（GGN-GO），通过捕获原子和残基水平的多尺度几何结构特征来丰富特征提取。我们使用几何向量感知机将这些特征转换为向量表示，并将其与节点特征聚合在一起，以在网络中更好地理解和传播。此外，我们引入了一个图注意力池化层，通过自适应聚合局部功能基元来捕获关键节点信息，而对比学习通过随机噪声和不同视图来增强图表示的可区分性。实验结果表明，在具有最多标签的实验验证和预测蛋白质结构任务中，GGN-GO 优于六个比较方法。此外，GGN-GO 识别出与实验证实的功能相对应的功能残基，展示了其可解释性和确定关键蛋白质区域的能力。代码和数据可在 https://github.com/MiJia-ID/GGN-GO 上获得。