Suppr超能文献

MSNGO:基于三维蛋白质结构和网络传播的多物种蛋白质功能注释

MSNGO: multi-species protein function annotation based on 3D protein structure and network propagation.

作者信息

Wang Beibei, Cui Boyue, Chen Shiqu, Wang Xuan, Wang Yadong, Li Junyi

机构信息

School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China.

Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China.

出版信息

Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf285.

Abstract

MOTIVATION

In recent years, protein function prediction has broken through the bottleneck of sequence features, significantly improving prediction accuracy using high-precision protein structures predicted by AlphaFold2. While single-species protein function prediction methods have achieved remarkable success, multi-species approaches still face challenges such as difficulties in multi-source data integration and insufficient knowledge transfer between distantly-related species. How to integrate large-scale data and provide effective cross-species label propagation for species with sparse protein annotations remains a critical and unresolved challenge. To address this problem, we propose the MSNGO (Multi-species protein Structures and Network to predict GO terms) model, which integrates structural features and network propagation methods. Our validation shows that using structural features can significantly improve the accuracy of multi-species protein function prediction.

RESULTS

We employ graph representation learning techniques to extract amino acid representations from protein structure contact maps and train a structural model using a graph convolution pooling module to derive protein-level structural features. After incorporating the sequence features from ESM-2, we apply a network propagation algorithm to aggregate information and update node representations within a heterogeneous network. The results demonstrate that MSNGO outperforms previous multi-species protein function prediction methods that rely on sequence features and protein-protein networks.

AVAILABILITY AND IMPLEMENTATION

https://github.com/blingbell/MSNGO.

摘要

动机

近年来,蛋白质功能预测突破了序列特征的瓶颈,利用AlphaFold2预测的高精度蛋白质结构显著提高了预测准确性。虽然单物种蛋白质功能预测方法取得了显著成功,但多物种方法仍面临多源数据整合困难以及远缘物种间知识转移不足等挑战。如何整合大规模数据并为蛋白质注释稀疏的物种提供有效的跨物种标签传播仍是一个关键且未解决的挑战。为解决此问题,我们提出了MSNGO(多物种蛋白质结构与网络预测GO术语)模型,该模型整合了结构特征和网络传播方法。我们的验证表明,使用结构特征可显著提高多物种蛋白质功能预测的准确性。

结果

我们采用图表示学习技术从蛋白质结构接触图中提取氨基酸表示,并使用图卷积池化模块训练一个结构模型以导出蛋白质水平的结构特征。在纳入来自ESM-2的序列特征后,我们应用网络传播算法在异构网络内聚合信息并更新节点表示。结果表明,MSNGO优于先前依赖序列特征和蛋白质-蛋白质网络的多物种蛋白质功能预测方法。

可用性与实现

https://github.com/blingbell/MSNGO

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3c8/12122197/e277e76ffc7e/btaf285f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验