PANDA2：使用图神经网络进行蛋白质功能预测

PANDA2: protein function prediction using graph neural networks.

作者信息

Zhao Chenguang, Liu Tong, Wang Zheng

机构信息

Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 33124, USA.

出版信息

NAR Genom Bioinform. 2022 Feb 2;4(1):lqac004. doi: 10.1093/nargab/lqac004. eCollection 2022 Mar.

DOI:10.1093/nargab/lqac004

PMID:35118378

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8808544/

Abstract

High-throughput sequencing technologies have generated massive protein sequences, but the annotations of protein sequences highly rely on the low-throughput and expensive biological experiments. Therefore, accurate and fast computational alternatives are needed to infer functional knowledge from protein sequences. The gene ontology (GO) directed acyclic graph (DAG) contains the hierarchical relationships between GO terms but is hard to be integrated into machine learning algorithms for functional predictions. We developed a deep learning system named PANDA2 to predict protein functions, which used the cutting-edge graph neural network to model the topology of the GO DAG and integrated the features generated by transformer protein language models. Compared with the top 10 methods in CAFA3, PANDA2 ranked first in cellular component ontology (CCO), tied first in biological process ontology (BPO) but had a higher coverage rate, and second in molecular function ontology (MFO). Compared with other recently-developed cutting-edge predictors DeepGOPlus, GOLabeler, and DeepText2GO, and benchmarked on another independent dataset, PANDA2 ranked first in CCO, first in BPO, and second in MFO. PANDA2 can be freely accessed from http://dna.cs.miami.edu/PANDA2/.

摘要

高通量测序技术已生成大量蛋白质序列，但蛋白质序列的注释高度依赖于低通量且昂贵的生物学实验。因此，需要准确且快速的计算方法来从蛋白质序列中推断功能知识。基因本体（GO）有向无环图（DAG）包含GO术语之间的层次关系，但难以集成到用于功能预测的机器学习算法中。我们开发了一个名为PANDA2的深度学习系统来预测蛋白质功能，该系统使用前沿的图神经网络对GO DAG的拓扑结构进行建模，并整合了由变压器蛋白质语言模型生成的特征。与CAFA3中的前10种方法相比，PANDA2在细胞组分本体（CCO）中排名第一，在生物过程本体（BPO）中并列第一但覆盖率更高，在分子功能本体（MFO）中排名第二。与其他最近开发的前沿预测器DeepGOPlus、GOLabeler和DeepText2GO相比，并在另一个独立数据集上进行基准测试，PANDA2在CCO中排名第一，在BPO中排名第一，在MFO中排名第二。可从http://dna.cs.miami.edu/PANDA2/免费访问PANDA2。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35b9/8808544/85312229e125/lqac004fig1.jpg

相似文献

PANDA2: protein function prediction using graph neural networks.

NAR Genom Bioinform. 2022 Feb 2;4(1):lqac004. doi: 10.1093/nargab/lqac004. eCollection 2022 Mar.

DeepGOPlus: improved protein function prediction from sequence.

Bioinformatics. 2020 Jan 15;36(2):422-429. doi: 10.1093/bioinformatics/btz595.

Predicting functions of maize proteins using graph convolutional network.

BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):420. doi: 10.1186/s12859-020-03745-6.

Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.

bioRxiv. 2023 Jan 20:2023.01.17.524477. doi: 10.1101/2023.01.17.524477.

Accurate protein function prediction via graph attention networks with predicted structure information.

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab502.

Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325. doi: 10.1093/bioinformatics/btad208.

NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.

Nucleic Acids Res. 2021 Jul 2;49(W1):W469-W475. doi: 10.1093/nar/gkab398.

PANDA-3D: protein function prediction based on AlphaFold models.

NAR Genom Bioinform. 2024 Aug 6;6(3):lqae094. doi: 10.1093/nargab/lqae094. eCollection 2024 Sep.

Hierarchical graph transformer with contrastive learning for protein function prediction.

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad410.

Embeddings from deep learning transfer GO annotations beyond homology.

Sci Rep. 2021 Jan 13;11(1):1160. doi: 10.1038/s41598-020-80786-0.

引用本文的文献

TCR-pMHC Binding Specificity Prediction From Structure Using Graph Neural Networks.

IEEE Trans Comput Biol Bioinform. 2025 Jan-Feb;22(1):171-179. doi: 10.1109/TCBBIO.2024.3504235.

GOBoost: leveraging long-tail gene ontology terms for accurate protein function prediction.

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf267.

GOBeacon: An ensemble model for protein function prediction enhanced by contrastive learning.

Protein Sci. 2025 Jul;34(7):e70182. doi: 10.1002/pro.70182.

Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.

Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

An experimental analysis of graph representation learning for Gene Ontology based protein function prediction.

PeerJ. 2024 Nov 14;12:e18509. doi: 10.7717/peerj.18509. eCollection 2024.

Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae568.

PRONTO-TK: a user-friendly PROtein Neural neTwOrk tool-kit for accessible protein function prediction.

NAR Genom Bioinform. 2024 Aug 27;6(3):lqae112. doi: 10.1093/nargab/lqae112. eCollection 2024 Sep.

PANDA-3D: protein function prediction based on AlphaFold models.

NAR Genom Bioinform. 2024 Aug 6;6(3):lqae094. doi: 10.1093/nargab/lqae094. eCollection 2024 Sep.

A comprehensive review and comparison of existing computational methods for protein function prediction.

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae289.

Deep learning methods for protein function prediction.

Proteomics. 2025 Jan;25(1-2):e2300471. doi: 10.1002/pmic.202300471. Epub 2024 Jul 12.

本文引用的文献

Structure-based protein function prediction using graph convolutional networks.

Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.

Embeddings from deep learning transfer GO annotations beyond homology.

Sci Rep. 2021 Jan 13;11(1):1160. doi: 10.1038/s41598-020-80786-0.

Predicting functions of maize proteins using graph convolutional network.

BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):420. doi: 10.1186/s12859-020-03745-6.

DeepciRGO: functional prediction of circular RNAs through hierarchical deep neural networks using heterogeneous network features.

BMC Bioinformatics. 2020 Nov 12;21(1):519. doi: 10.1186/s12859-020-03748-3.

GraphQA: protein model quality assessment using graph convolutional networks.

Bioinformatics. 2021 Apr 20;37(3):360-366. doi: 10.1093/bioinformatics/btaa714.

A bacterial phyla dataset for protein function prediction.

Data Brief. 2019 Dec 18;28:105002. doi: 10.1016/j.dib.2019.105002. eCollection 2020 Feb.

UDSMProt: universal deep sequence models for protein classification.

Bioinformatics. 2020 Apr 15;36(8):2401-2409. doi: 10.1093/bioinformatics/btaa003.

Modeling aspects of the language of life through transfer-learning protein sequences.

BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Genome Biol. 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PANDA2：使用图神经网络进行蛋白质功能预测

PANDA2: protein function prediction using graph neural networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献