• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ProtFun:一种使用图注意力网络和蛋白质大语言模型的蛋白质功能预测模型。

ProtFun: A Protein Function Prediction Model Using Graph Attention Networks with a Protein Large Language Model.

作者信息

Talo Muhammed, Bozdag Serdar

机构信息

Department of Computer Science and Engineering, University of North Texas, Denton, TX 76207, USA.

BioDiscovery Institute, University of North Texas, Denton, TX 76207, USA.

出版信息

bioRxiv. 2025 May 17:2025.05.13.653854. doi: 10.1101/2025.05.13.653854.

DOI:10.1101/2025.05.13.653854
PMID:40463264
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12132266/
Abstract

Understanding protein functions facilitates the identification of the underlying causes of many diseases and guides the research for discovering new therapeutic targets and medications. With the advancement of high throughput technologies, obtaining novel protein sequences has been a routine process. However, determining protein functions experimentally is cost- and labor-prohibitive. Therefore, it is crucial to develop computational methods for automatic protein function prediction. In this study, we propose a multi-modal deep learning architecture called ProtFun to predict protein functions. ProtFun integrates protein large language model (LLM) embeddings as node features in a protein family network. Employing graph attention networks (GAT) on this protein family network, ProtFun learns protein embeddings, which are integrated with protein signature representations from InterPro to train a protein function prediction model. We evaluated our architecture using three benchmark datasets. Our results showed that our proposed approach outperformed current state-of-the-art methods for most cases. An ablation study also highlighted the importance of different components of ProtFun. The data and source code of ProtFun is available at https://github.com/bozdaglab/ProtFun under Creative Commons Attribution Non Commercial 4.0 International Public License.

摘要

了解蛋白质功能有助于确定许多疾病的潜在病因,并指导寻找新的治疗靶点和药物的研究。随着高通量技术的进步,获取新的蛋白质序列已成为常规过程。然而,通过实验确定蛋白质功能在成本和人力方面都令人望而却步。因此,开发用于自动预测蛋白质功能的计算方法至关重要。在本研究中,我们提出了一种名为ProtFun的多模态深度学习架构来预测蛋白质功能。ProtFun将蛋白质大语言模型(LLM)嵌入作为蛋白质家族网络中的节点特征。在这个蛋白质家族网络上使用图注意力网络(GAT),ProtFun学习蛋白质嵌入,这些嵌入与来自InterPro的蛋白质特征表示相结合,以训练蛋白质功能预测模型。我们使用三个基准数据集评估了我们的架构。我们的结果表明,在大多数情况下,我们提出的方法优于当前的最先进方法。一项消融研究还突出了ProtFun不同组件的重要性。ProtFun的数据和源代码可在https://github.com/bozdaglab/ProtFun上获取,遵循知识共享署名非商业性4.0国际公共许可协议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1772/12132266/53315cb28a53/nihpp-2025.05.13.653854v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1772/12132266/53315cb28a53/nihpp-2025.05.13.653854v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1772/12132266/53315cb28a53/nihpp-2025.05.13.653854v1-f0001.jpg

相似文献

1
ProtFun: A Protein Function Prediction Model Using Graph Attention Networks with a Protein Large Language Model.ProtFun:一种使用图注意力网络和蛋白质大语言模型的蛋白质功能预测模型。
bioRxiv. 2025 May 17:2025.05.13.653854. doi: 10.1101/2025.05.13.653854.
2
Fusing multiplex heterogeneous networks using graph attention-aware fusion networks.使用图注意力感知融合网络融合多源异构网络。
Sci Rep. 2024 Nov 24;14(1):29119. doi: 10.1038/s41598-024-78555-4.
3
Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction.Top-DTI:整合拓扑深度学习与大语言模型用于药物靶点相互作用预测
bioRxiv. 2025 Feb 8:2025.02.07.637146. doi: 10.1101/2025.02.07.637146.
4
Predicting miRNA-disease association via graph attention learning and multiplex adaptive modality fusion.通过图注意力学习和多复用自适应模态融合预测 miRNA-疾病关联。
Comput Biol Med. 2024 Feb;169:107904. doi: 10.1016/j.compbiomed.2023.107904. Epub 2023 Dec 28.
5
MMGCN: Multi-modal multi-view graph convolutional networks for cancer prognosis prediction.多模态多视图图卷积网络用于癌症预后预测。
Comput Methods Programs Biomed. 2024 Dec;257:108400. doi: 10.1016/j.cmpb.2024.108400. Epub 2024 Sep 6.
6
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.将蛋白质序列和结构与转换器和等变图神经网络相结合,以预测蛋白质功能。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325. doi: 10.1093/bioinformatics/btad208.
7
GATLGEMF: A graph attention model with line graph embedding multi-complex features for ncRNA-protein interactions prediction.GATLGEMF:一种具有线图嵌入多复杂特征的图注意力模型,用于非编码RNA-蛋白质相互作用预测。
Comput Biol Chem. 2024 Feb;108:108000. doi: 10.1016/j.compbiolchem.2023.108000. Epub 2023 Dec 6.
8
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.将蛋白质序列和结构与变换器和等变图神经网络相结合以预测蛋白质功能。
bioRxiv. 2023 Jan 20:2023.01.17.524477. doi: 10.1101/2023.01.17.524477.
9
AttentionMGT-DTA: A multi-modal drug-target affinity prediction using graph transformer and attention mechanism.AttentionMGT-DTA:一种基于图变换和注意力机制的多模态药物-靶标亲和力预测方法。
Neural Netw. 2024 Jan;169:623-636. doi: 10.1016/j.neunet.2023.11.018. Epub 2023 Nov 11.
10
ToxDL 2.0: Protein toxicity prediction using a pretrained language model and graph neural networks.ToxDL 2.0:使用预训练语言模型和图神经网络进行蛋白质毒性预测。
Comput Struct Biotechnol J. 2025 Apr 2;27:1538-1549. doi: 10.1016/j.csbj.2025.04.002. eCollection 2025.

本文引用的文献

1
SEGT-GO: a graph transformer method based on PPI serialization and explanatory artificial intelligence for protein function prediction.SEGT-GO:一种基于蛋白质-蛋白质相互作用序列化和解释性人工智能的图变换器方法用于蛋白质功能预测。
BMC Bioinformatics. 2025 Feb 10;26(1):46. doi: 10.1186/s12859-025-06059-7.
2
CAFA-evaluator: a Python tool for benchmarking ontological classification methods.CAFA评估器:一种用于对本体分类方法进行基准测试的Python工具。
Bioinform Adv. 2024 Mar 14;4(1):vbae043. doi: 10.1093/bioadv/vbae043. eCollection 2024.
3
Partial order relation-based gene ontology embedding improves protein function prediction.
基于偏序关系的本体论嵌入可提高蛋白质功能预测。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae077.
4
Sequence-structure-function relationships in the microbial protein universe.微生物蛋白质宇宙中的序列-结构-功能关系。
Nat Commun. 2023 Apr 26;14(1):2351. doi: 10.1038/s41467-023-37896-w.
5
NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations.NetGO 3.0:蛋白质语言模型提高大规模功能注释
Genomics Proteomics Bioinformatics. 2023 Apr;21(2):349-358. doi: 10.1016/j.gpb.2023.04.001. Epub 2023 Apr 17.
6
Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion.通过预训练语言模型和基于同源性的标签扩散,从序列快速准确地预测蛋白质功能。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad117.
7
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
8
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.
9
InterPro in 2022.InterPro 在 2022 年。
Nucleic Acids Res. 2023 Jan 6;51(D1):D418-D427. doi: 10.1093/nar/gkac993.
10
DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.DeepGOZero:基于本体论公理的序列和零样本学习改进蛋白质功能预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245. doi: 10.1093/bioinformatics/btac256.