• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MKFGO:将多源知识融合与预训练语言模型相结合用于高精度蛋白质功能预测

MKFGO: integrating multi-source knowledge fusion with pretrained language model for high-accuracy protein function prediction.

作者信息

Zhu Yi-Heng, Zhu Shuxin, Yu Xuan, Yan He, Liu Yan, Xie Xiaojun, Yu Dong-Jun, Ye Rui

机构信息

College of Artificial Intelligence, Nanjing Agricultural University, 666 Binjiang Avenue, Jiangbei New District, Nanjing, Jiangsu Province, 211800, China.

Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong SAR (HKG), 999077, China.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf420.

DOI:10.1093/bib/bbaf420
PMID:40814232
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12354956/
Abstract

Accurately identifying protein functions is essential to understand life mechanisms and thus advance drug discovery. Although biochemical experiments are the gold standard for determining protein functions, they are often time-consuming and labor-intensive. Here, we proposed a novel composite deep-learning method, Multi-source Knowledge Fusion for Gene Ontology prediction (MKFGO), to infer Gene Ontology (GO) attributes through integrating five complementary pipelines built on multi-source biological data. MKFGO was rigorously benchmarked on 1522 nonredundant proteins, demonstrating superior performance over 12 state-of-the-art function prediction methods. Comprehensive data analyses revealed that the major advantage of MKFGO lies in its two deep-learning components, handcrafted feature representation-based GO prediction (HFRGO) and protein large language model (PLM)-based GO prediction (PLMGO), which derive handcrafted features and PLM-based features, respectively, from protein sequences in different biological views, with effective knowledge fusion at the decision-level. HFRGO leverages a long short-term memory (LSTM)-attention network embedded with handcrafted features, in which the triplet loss-based guilt-by-association strategy is designed to enhance the correlation between feature similarity and function similarity. PLMGO employs the PLM to capture feature embeddings with discriminative functional patterns from sequences. Meanwhile, another three components provide complementary insights for further improving prediction accuracy, driven by protein-protein interaction, GO term probability, and protein-coding gene sequence, respectively. The source codes and models of MKFGO are freely available at https://github.com/yiheng-zhu/MKFGO.

摘要

准确识别蛋白质功能对于理解生命机制进而推动药物研发至关重要。尽管生化实验是确定蛋白质功能的金标准,但它们往往耗时且费力。在此,我们提出了一种新颖的复合深度学习方法,即用于基因本体预测的多源知识融合(MKFGO),通过整合基于多源生物数据构建的五个互补管道来推断基因本体(GO)属性。MKFGO在1522个非冗余蛋白质上进行了严格的基准测试,证明其性能优于12种先进的功能预测方法。全面的数据分析表明,MKFGO的主要优势在于其两个深度学习组件,即基于手工特征表示的GO预测(HFRGO)和基于蛋白质大语言模型(PLM)的GO预测(PLMGO),它们分别从不同生物学视角的蛋白质序列中提取手工特征和基于PLM的特征,并在决策层面进行有效的知识融合。HFRGO利用嵌入手工特征的长短期记忆(LSTM)-注意力网络,其中基于三联体损失的关联定罪策略旨在增强特征相似性与功能相似性之间的相关性。PLMGO使用PLM从序列中捕获具有判别功能模式的特征嵌入。同时,另外三个组件分别由蛋白质-蛋白质相互作用、GO术语概率和蛋白质编码基因序列驱动,为进一步提高预测准确性提供互补见解。MKFGO的源代码和模型可在https://github.com/yiheng-zhu/MKFGO上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/431b63147c91/bbaf420f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/4dae6f43edcd/bbaf420f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/b35768615187/bbaf420f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/adaa8a66e455/bbaf420f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/77df0740e597/bbaf420f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/999462f4b214/bbaf420f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/431b63147c91/bbaf420f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/4dae6f43edcd/bbaf420f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/b35768615187/bbaf420f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/adaa8a66e455/bbaf420f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/77df0740e597/bbaf420f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/999462f4b214/bbaf420f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7626/12354956/431b63147c91/bbaf420f6.jpg

相似文献

1
MKFGO: integrating multi-source knowledge fusion with pretrained language model for high-accuracy protein function prediction.MKFGO:将多源知识融合与预训练语言模型相结合用于高精度蛋白质功能预测
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf420.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Short-Term Memory Impairment短期记忆障碍
4
ProtGO: universal protein function prediction utilizing multi-modal gene ontology knowledge.ProtGO:利用多模态基因本体知识进行通用蛋白质功能预测
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf390.
5
Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models.通过整合多源蛋白质语言模型提高抗耐甲氧西林金黄色葡萄球菌肽预测的准确性
Interdiscip Sci. 2025 Mar 11. doi: 10.1007/s12539-025-00696-5.
6
Advancing the accuracy of clathrin protein prediction through multi-source protein language models.通过多源蛋白质语言模型提高网格蛋白蛋白质预测的准确性。
Sci Rep. 2025 Jul 8;15(1):24403. doi: 10.1038/s41598-025-08510-4.
7
POSA-GO: Fusion of Hierarchical Gene Ontology and Protein Language Models for Protein Function Prediction.POSA-GO:用于蛋白质功能预测的分层基因本体与蛋白质语言模型融合
Int J Mol Sci. 2025 Jul 1;26(13):6362. doi: 10.3390/ijms26136362.
8
Hybrid protein-ligand binding residue prediction with protein language models: does the structure matter?利用蛋白质语言模型进行混合蛋白质-配体结合残基预测:结构重要吗?
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf431.
9
Multi-stage attention-based extraction and fusion of protein sequence and structural features for protein function prediction.基于多阶段注意力机制的蛋白质序列与结构特征提取及融合用于蛋白质功能预测
Bioinformatics. 2025 Jun 26. doi: 10.1093/bioinformatics/btaf374.
10
A deep learning model for predicting systemic lupus erythematosus-associated epitopes.一种用于预测系统性红斑狼疮相关表位的深度学习模型。
BMC Med Inform Decis Mak. 2025 Jul 1;25(1):230. doi: 10.1186/s12911-025-03056-x.

本文引用的文献

1
DPFunc: accurately predicting protein function via deep learning with domain-guided structure information.DPFunc:利用域引导的结构信息通过深度学习准确预测蛋白质功能。
Nat Commun. 2025 Jan 2;16(1):70. doi: 10.1038/s41467-024-54816-8.
2
MORE: a multi-omics data-driven hypergraph integration network for biomedical data classification and biomarker identification.MORE:一种用于生物医学数据分类和生物标志物识别的多组学数据驱动的超图整合网络。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae658.
3
Nucleotide Transformer: building and evaluating robust foundation models for human genomics.
核苷酸变换器:构建和评估用于人类基因组学的强大基础模型。
Nat Methods. 2025 Feb;22(2):287-297. doi: 10.1038/s41592-024-02523-z. Epub 2024 Nov 28.
4
Improving protein function prediction by learning and integrating representations of protein sequences and function labels.通过学习和整合蛋白质序列及功能标签的表示来改进蛋白质功能预测。
Bioinform Adv. 2024 Aug 17;4(1):vbae120. doi: 10.1093/bioadv/vbae120. eCollection 2024.
5
AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding.AnnoPRO:一种基于多尺度蛋白质表示和双通道编码混合深度学习的蛋白质功能注释策略。
Genome Biol. 2024 Feb 1;25(1):41. doi: 10.1186/s13059-024-03166-1.
6
E2EATP: Fast and High-Accuracy Protein-ATP Binding Residue Prediction via Protein Language Model Embedding.E2EATP:通过蛋白质语言模型嵌入实现快速且高精度的蛋白质-ATP 结合残基预测。
J Chem Inf Model. 2024 Jan 8;64(1):289-300. doi: 10.1021/acs.jcim.3c01298. Epub 2023 Dec 21.
7
Graph convolutional networks: a comprehensive review.图卷积网络:全面综述。
Comput Soc Netw. 2019;6(1):11. doi: 10.1186/s40649-019-0069-y. Epub 2019 Nov 10.
8
SNN6mA: Improved DNA N6-methyladenine site prediction using Siamese network-based feature embedding.SNN6mA:基于孪生网络的特征嵌入提高 DNA N6-甲基腺嘌呤位点预测。
Comput Biol Med. 2023 Nov;166:107533. doi: 10.1016/j.compbiomed.2023.107533. Epub 2023 Sep 27.
9
Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion.通过预训练语言模型和基于同源性的标签扩散,从序列快速准确地预测蛋白质功能。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad117.
10
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.