• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过学习和整合蛋白质序列及功能标签的表示来改进蛋白质功能预测。

Improving protein function prediction by learning and integrating representations of protein sequences and function labels.

作者信息

Boadu Frimpong, Cheng Jianlin

机构信息

Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States.

出版信息

Bioinform Adv. 2024 Aug 17;4(1):vbae120. doi: 10.1093/bioadv/vbae120. eCollection 2024.

DOI:10.1093/bioadv/vbae120
PMID:39233898
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11374024/
Abstract

MOTIVATION

As fewer than 1% of proteins have protein function information determined experimentally, computationally predicting the function of proteins is critical for obtaining functional information for most proteins and has been a major challenge in protein bioinformatics. Despite the significant progress made in protein function prediction by the community in the last decade, the general accuracy of protein function prediction is still not high, particularly for rare function terms associated with few proteins in the protein function annotation database such as the UniProt.

RESULTS

We introduce TransFew, a new transformer model, to learn the representations of both protein sequences and function labels [Gene Ontology (GO) terms] to predict the function of proteins. TransFew leverages a large pre-trained protein language model (ESM2-t48) to learn function-relevant representations of proteins from raw protein sequences and uses a biological natural language model (BioBert) and a graph convolutional neural network-based autoencoder to generate semantic representations of GO terms from their textual definition and hierarchical relationships, which are combined together to predict protein function via the cross-attention. Integrating the protein sequence and label representations not only enhances overall function prediction accuracy, but delivers a robust performance of predicting rare function terms with limited annotations by facilitating annotation transfer between GO terms.

AVAILABILITY AND IMPLEMENTATION

https://github.com/BioinfoMachineLearning/TransFew.

摘要

动机

由于通过实验确定蛋白质功能信息的蛋白质不到1%,因此通过计算预测蛋白质功能对于获取大多数蛋白质的功能信息至关重要,并且一直是蛋白质生物信息学中的一项重大挑战。尽管在过去十年中,该领域在蛋白质功能预测方面取得了显著进展,但蛋白质功能预测的总体准确性仍然不高,特别是对于与蛋白质功能注释数据库(如UniProt)中少数蛋白质相关的罕见功能术语。

结果

我们引入了一种新的Transformer模型TransFew,用于学习蛋白质序列和功能标签(基因本体论(GO)术语)的表示,以预测蛋白质的功能。TransFew利用一个大型预训练蛋白质语言模型(ESM2-t48)从原始蛋白质序列中学习与功能相关的蛋白质表示,并使用生物自然语言模型(BioBert)和基于图卷积神经网络的自动编码器从GO术语的文本定义和层次关系中生成语义表示,通过交叉注意力将它们组合在一起以预测蛋白质功能。整合蛋白质序列和标签表示不仅提高了整体功能预测准确性,而且通过促进GO术语之间的注释转移,在有限注释的情况下对罕见功能术语进行预测时具有强大的性能。

可用性和实现

https://github.com/BioinfoMachineLearning/TransFew。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d480/11374024/1dfb46abed46/vbae120f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d480/11374024/ff2951b14500/vbae120f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d480/11374024/a886cf0c977a/vbae120f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d480/11374024/1dfb46abed46/vbae120f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d480/11374024/ff2951b14500/vbae120f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d480/11374024/a886cf0c977a/vbae120f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d480/11374024/1dfb46abed46/vbae120f3.jpg

相似文献

1
Improving protein function prediction by learning and integrating representations of protein sequences and function labels.通过学习和整合蛋白质序列及功能标签的表示来改进蛋白质功能预测。
Bioinform Adv. 2024 Aug 17;4(1):vbae120. doi: 10.1093/bioadv/vbae120. eCollection 2024.
2
Predicting functions of maize proteins using graph convolutional network.利用图卷积网络预测玉米蛋白的功能。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):420. doi: 10.1186/s12859-020-03745-6.
3
GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction.GOProFormer:一种用于基因本体蛋白质功能预测的多模态 Transformer 方法。
Biomolecules. 2022 Nov 18;12(11):1709. doi: 10.3390/biom12111709.
4
PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships.PFresGO:一种基于注意力机制的深度学习方法,通过整合基因本体论的相互关系来进行蛋白质注释。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad094.
5
TransformerGO: predicting protein-protein interactions by modelling the attention between sets of gene ontology terms.TransformerGO:通过建模基因本体论术语集之间的注意力来预测蛋白质-蛋白质相互作用。
Bioinformatics. 2022 Apr 12;38(8):2269-2277. doi: 10.1093/bioinformatics/btac104.
6
Improving protein function prediction using protein sequence and GO-term similarities.利用蛋白质序列和 GO 术语相似性提高蛋白质功能预测。
Bioinformatics. 2019 Apr 1;35(7):1116-1124. doi: 10.1093/bioinformatics/bty751.
7
Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.基于蛋白质知识的 GO 注释预测的分层深度学习
Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.
8
Exploiting ontology graph for predicting sparsely annotated gene function.利用本体图预测注释稀疏的基因功能。
Bioinformatics. 2015 Jun 15;31(12):i357-64. doi: 10.1093/bioinformatics/btv260.
9
Protein Function Prediction With Functional and Topological Knowledge of Gene Ontology.基于基因本体论的功能和拓扑知识的蛋白质功能预测。
IEEE Trans Nanobioscience. 2023 Oct;22(4):755-762. doi: 10.1109/TNB.2023.3278033. Epub 2023 Oct 3.
10
TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.TALE:基于 Transformer 的蛋白质功能注释与联合序列-标签嵌入。
Bioinformatics. 2021 Sep 29;37(18):2825-2833. doi: 10.1093/bioinformatics/btab198.

引用本文的文献

1
MKFGO: integrating multi-source knowledge fusion with pretrained language model for high-accuracy protein function prediction.MKFGO:将多源知识融合与预训练语言模型相结合用于高精度蛋白质功能预测
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf420.
2
ProtGO: universal protein function prediction utilizing multi-modal gene ontology knowledge.ProtGO:利用多模态基因本体知识进行通用蛋白质功能预测
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf390.
3
Multimodal deep learning integration of cryo-EM and AlphaFold3 for high-accuracy protein structure determination.

本文引用的文献

1
De novo atomic protein structure modeling for cryoEM density maps using 3D transformer and HMM.利用 3D 转换器和 HMM 对冷冻电镜密度图进行从头原子蛋白结构建模。
Nat Commun. 2024 Jun 29;15(1):5511. doi: 10.1038/s41467-024-49647-6.
2
CAFA-evaluator: a Python tool for benchmarking ontological classification methods.CAFA评估器:一种用于对本体分类方法进行基准测试的Python工具。
Bioinform Adv. 2024 Mar 14;4(1):vbae043. doi: 10.1093/bioadv/vbae043. eCollection 2024.
3
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.
用于高精度蛋白质结构测定的冷冻电镜与AlphaFold3的多模态深度学习整合
bioRxiv. 2025 Jul 3:2025.07.03.663071. doi: 10.1101/2025.07.03.663071.
将蛋白质序列和结构与转换器和等变图神经网络相结合,以预测蛋白质功能。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325. doi: 10.1093/bioinformatics/btad208.
4
A large expert-curated cryo-EM image dataset for machine learning protein particle picking.用于机器学习蛋白质粒子挑选的大型专家 curated 低温电子显微镜图像数据集。
Sci Data. 2023 Jun 22;10(1):392. doi: 10.1038/s41597-023-02280-2.
5
NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations.NetGO 3.0:蛋白质语言模型提高大规模功能注释
Genomics Proteomics Bioinformatics. 2023 Apr;21(2):349-358. doi: 10.1016/j.gpb.2023.04.001. Epub 2023 Apr 17.
6
Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion.通过预训练语言模型和基于同源性的标签扩散,从序列快速准确地预测蛋白质功能。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad117.
7
The Gene Ontology knowledgebase in 2023.2023 版基因本体论知识库。
Genetics. 2023 May 4;224(1). doi: 10.1093/genetics/iyad031.
8
InterPro in 2022.InterPro 在 2022 年。
Nucleic Acids Res. 2023 Jan 6;51(D1):D418-D427. doi: 10.1093/nar/gkac993.
9
Multimodal biomedical AI.多模态生物医学人工智能。
Nat Med. 2022 Sep;28(9):1773-1784. doi: 10.1038/s41591-022-01981-2. Epub 2022 Sep 15.
10
A Review of Generalized Zero-Shot Learning Methods.广义零样本学习方法综述
IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4051-4070. doi: 10.1109/TPAMI.2022.3191696. Epub 2023 Mar 7.