• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HNetGO:基于异质网络转换器的蛋白质功能预测。

HNetGO: protein function prediction via heterogeneous network transformer.

机构信息

School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China.

General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin 150086, China.

出版信息

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbab556.

DOI:10.1093/bib/bbab556
PMID:37861172
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10588005/
Abstract

Protein function annotation is one of the most important research topics for revealing the essence of life at molecular level in the post-genome era. Current research shows that integrating multisource data can effectively improve the performance of protein function prediction models. However, the heavy reliance on complex feature engineering and model integration methods limits the development of existing methods. Besides, models based on deep learning only use labeled data in a certain dataset to extract sequence features, thus ignoring a large amount of existing unlabeled sequence data. Here, we propose an end-to-end protein function annotation model named HNetGO, which innovatively uses heterogeneous network to integrate protein sequence similarity and protein-protein interaction network information and combines the pretraining model to extract the semantic features of the protein sequence. In addition, we design an attention-based graph neural network model, which can effectively extract node-level features from heterogeneous networks and predict protein function by measuring the similarity between protein nodes and gene ontology term nodes. Comparative experiments on the human dataset show that HNetGO achieves state-of-the-art performance on cellular component and molecular function branches.

摘要

蛋白质功能注释是后基因组时代揭示生命本质的最重要的研究课题之一。目前的研究表明,整合多源数据可以有效地提高蛋白质功能预测模型的性能。然而,对复杂特征工程和模型集成方法的严重依赖限制了现有方法的发展。此外,基于深度学习的模型仅使用特定数据集的标记数据来提取序列特征,从而忽略了大量现有的未标记序列数据。在这里,我们提出了一个端到端的蛋白质功能注释模型,称为 HNetGO,它创新性地使用异构网络来整合蛋白质序列相似性和蛋白质-蛋白质相互作用网络信息,并结合预训练模型来提取蛋白质序列的语义特征。此外,我们设计了一个基于注意力的图神经网络模型,该模型可以从异构网络中有效地提取节点级特征,并通过测量蛋白质节点和基因本体论节点之间的相似性来预测蛋白质功能。在人类数据集上的对比实验表明,HNetGO 在细胞成分和分子功能分支上达到了最先进的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8484/10588005/eaa02049dcbc/bbab556f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8484/10588005/73fb53affad9/bbab556f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8484/10588005/f7d9fac35a9b/bbab556f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8484/10588005/eaa02049dcbc/bbab556f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8484/10588005/73fb53affad9/bbab556f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8484/10588005/f7d9fac35a9b/bbab556f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8484/10588005/eaa02049dcbc/bbab556f2.jpg

相似文献

1
HNetGO: protein function prediction via heterogeneous network transformer.HNetGO:基于异质网络转换器的蛋白质功能预测。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbab556.
2
Predicting functions of maize proteins using graph convolutional network.利用图卷积网络预测玉米蛋白的功能。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):420. doi: 10.1186/s12859-020-03745-6.
3
DeepciRGO: functional prediction of circular RNAs through hierarchical deep neural networks using heterogeneous network features.DeepciRGO:通过使用异构网络特征的分层深度神经网络对环状 RNA 进行功能预测。
BMC Bioinformatics. 2020 Nov 12;21(1):519. doi: 10.1186/s12859-020-03748-3.
4
Long-distance dependency combined multi-hop graph neural networks for protein-protein interactions prediction.长程依赖的多跳图神经网络用于蛋白质相互作用预测。
BMC Bioinformatics. 2022 Dec 5;23(1):521. doi: 10.1186/s12859-022-05062-6.
5
Implementing link prediction in protein networks via feature fusion models based on graph neural networks.基于图神经网络的特征融合模型在蛋白质网络中实现链接预测。
Comput Biol Chem. 2024 Feb;108:107980. doi: 10.1016/j.compbiolchem.2023.107980. Epub 2023 Nov 5.
6
Pre-training graph neural networks for link prediction in biomedical networks.用于生物医学网络中链接预测的预训练图神经网络。
Bioinformatics. 2022 Apr 12;38(8):2254-2262. doi: 10.1093/bioinformatics/btac100.
7
Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction.将无监督语言模型与三重态神经网络集成,用于蛋白质基因本体预测。
PLoS Comput Biol. 2022 Dec 22;18(12):e1010793. doi: 10.1371/journal.pcbi.1010793. eCollection 2022 Dec.
8
Graph2GO: a multi-modal attributed network embedding method for inferring protein functions.Graph2GO:一种用于推断蛋白质功能的多模态属性网络嵌入方法。
Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa081.
9
HN-PPISP: a hybrid network based on MLP-Mixer for protein-protein interaction site prediction.HN-PPISP:一种基于MLP-Mixer的用于蛋白质-蛋白质相互作用位点预测的混合网络。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac480.
10
multi-type neighbors enhanced global topology and pairwise attribute learning for drug-protein interaction prediction.用于药物-蛋白质相互作用预测的多类型邻居增强全局拓扑和成对属性学习
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac120.

引用本文的文献

1
POSA-GO: Fusion of Hierarchical Gene Ontology and Protein Language Models for Protein Function Prediction.POSA-GO:用于蛋白质功能预测的分层基因本体与蛋白质语言模型融合
Int J Mol Sci. 2025 Jul 1;26(13):6362. doi: 10.3390/ijms26136362.
2
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
3
GTPLM-GO: Enhancing Protein Function Prediction Through Dual-Branch Graph Transformer and Protein Language Model Fusing Sequence and Local-Global PPI Information.

本文引用的文献

1
Editorial: Feature Representation and Learning Methods With Applications in Protein Secondary Structure.社论:蛋白质二级结构中的特征表示与学习方法及其应用
Front Bioeng Biotechnol. 2021 Sep 9;9:748722. doi: 10.3389/fbioe.2021.748722. eCollection 2021.
2
Integration of Multiple-Omics Data to Analyze the Population-Specific Differences for Coronary Artery Disease.多组学数据的整合分析冠心病的人群特异性差异。
Comput Math Methods Med. 2021 Aug 17;2021:7036592. doi: 10.1155/2021/7036592. eCollection 2021.
3
NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning.
GTPLM-GO:通过融合序列和局部-全局蛋白质-蛋白质相互作用信息的双分支图变换器和蛋白质语言模型增强蛋白质功能预测
Int J Mol Sci. 2025 Apr 25;26(9):4088. doi: 10.3390/ijms26094088.
4
DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models.DNA序列分析全景:对DNA序列分析任务类型、数据库、数据集、词嵌入方法和语言模型的全面综述。
Front Med (Lausanne). 2025 Apr 8;12:1503229. doi: 10.3389/fmed.2025.1503229. eCollection 2025.
5
SEGT-GO: a graph transformer method based on PPI serialization and explanatory artificial intelligence for protein function prediction.SEGT-GO:一种基于蛋白质-蛋白质相互作用序列化和解释性人工智能的图变换器方法用于蛋白质功能预测。
BMC Bioinformatics. 2025 Feb 10;26(1):46. doi: 10.1186/s12859-025-06059-7.
6
Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review.评估蛋白质语言模型在蛋白质功能预测编码策略方面的进展:全面综述。
Front Bioeng Biotechnol. 2025 Jan 21;13:1506508. doi: 10.3389/fbioe.2025.1506508. eCollection 2025.
7
Transitioning from wet lab to artificial intelligence: a systematic review of AI predictors in CRISPR.从湿实验室到人工智能的转变:对CRISPR中人工智能预测因子的系统综述
J Transl Med. 2025 Feb 4;23(1):153. doi: 10.1186/s12967-024-06013-w.
8
An experimental analysis of graph representation learning for Gene Ontology based protein function prediction.基于基因本体论的蛋白质功能预测的图表示学习的实验分析。
PeerJ. 2024 Nov 14;12:e18509. doi: 10.7717/peerj.18509. eCollection 2024.
9
Advances in Protein-Ligand Binding Affinity Prediction via Deep Learning: A Comprehensive Study of Datasets, Data Preprocessing Techniques, and Model Architectures.基于深度学习的蛋白质-配体结合亲和力预测方法进展:数据集、数据预处理技术和模型架构的综合研究。
Curr Drug Targets. 2024;25(15):1041-1065. doi: 10.2174/0113894501330963240905083020.
10
A comprehensive review and comparison of existing computational methods for protein function prediction.蛋白质功能预测现有计算方法的综合回顾与比较。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae289.
神经肽预测模型 FRL:基于特征表示学习的神经肽识别可解释预测模型。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab167.
4
StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides.StackIL6:一种用于提高白细胞介素 6 诱导肽预测能力的堆叠集成模型。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab172.
5
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
6
Sensitive protein alignments at tree-of-life scale using DIAMOND.使用 DIAMOND 进行生命之树尺度上的敏感蛋白质比对。
Nat Methods. 2021 Apr;18(4):366-368. doi: 10.1038/s41592-021-01101-x. Epub 2021 Apr 7.
7
TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.TALE:基于 Transformer 的蛋白质功能注释与联合序列-标签嵌入。
Bioinformatics. 2021 Sep 29;37(18):2825-2833. doi: 10.1093/bioinformatics/btab198.
8
BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides.BERT4Bitter:一种基于变换器双向编码器表征(BERT)的模型,用于改进苦味肽的预测。
Bioinformatics. 2021 Sep 9;37(17):2556-2562. doi: 10.1093/bioinformatics/btab133.
9
Anticancer peptides prediction with deep representation learning features.基于深度表示学习特征的抗癌肽预测。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab008.
10
Embeddings from deep learning transfer GO annotations beyond homology.深度学习的嵌入信息可以将 GO 注释扩展到同源之外。
Sci Rep. 2021 Jan 13;11(1):1160. doi: 10.1038/s41598-020-80786-0.