• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GORetriever:通过基于文献的深度信息检索对基于蛋白质描述的 GO 候选物进行重新排序,用于蛋白质功能注释。

GORetriever: reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation.

机构信息

Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China.

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto Prefecture 611-0011, Japan.

出版信息

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii53-ii61. doi: 10.1093/bioinformatics/btae401.

DOI:10.1093/bioinformatics/btae401
PMID:39230707
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11520413/
Abstract

SUMMARY

The vast majority of proteins still lack experimentally validated functional annotations, which highlights the importance of developing high-performance automated protein function prediction/annotation (AFP) methods. While existing approaches focus on protein sequences, networks, and structural data, textual information related to proteins has been overlooked. However, roughly 82% of SwissProt proteins already possess literature information that experts have annotated. To efficiently and effectively use literature information, we present GORetriever, a two-stage deep information retrieval-based method for AFP. Given a target protein, in the first stage, candidate Gene Ontology (GO) terms are retrieved by using annotated proteins with similar descriptions. In the second stage, the GO terms are reranked based on semantic matching between the GO definitions and textual information (literature and protein description) of the target protein. Extensive experiments over benchmark datasets demonstrate the remarkable effectiveness of GORetriever in enhancing the AFP performance. Note that GORetriever is the key component of GOCurator, which has achieved first place in the latest critical assessment of protein function annotation (CAFA5: over 1600 teams participated), held in 2023-2024.

AVAILABILITY AND IMPLEMENTATION

GORetriever is publicly available at https://github.com/ZhuLab-Fudan/GORetriever.

摘要

摘要

绝大多数蛋白质仍然缺乏经过实验验证的功能注释,这凸显了开发高性能自动化蛋白质功能预测/注释(AFP)方法的重要性。虽然现有的方法主要关注蛋白质序列、网络和结构数据,但与蛋白质相关的文本信息却被忽视了。然而,大约 82%的 SwissProt 蛋白质已经拥有专家注释的文献信息。为了高效、有效地利用文献信息,我们提出了 GORetriever,这是一种基于深度信息检索的两阶段 AFP 方法。给定一个目标蛋白质,在第一阶段,通过使用具有相似描述的注释蛋白质来检索候选基因本体 (GO) 术语。在第二阶段,根据目标蛋白质的 GO 定义和文本信息(文献和蛋白质描述)之间的语义匹配对 GO 术语进行重新排序。在基准数据集上进行的广泛实验证明了 GORetriever 在增强 AFP 性能方面的显著效果。请注意,GORetriever 是 GOCurator 的关键组成部分,GOCurator 在 2023-2024 年举行的最新蛋白质功能注释关键评估(CAFA5:有超过 1600 个团队参加)中获得了第一名。

使用情况和实现

GORetriever 可在 https://github.com/ZhuLab-Fudan/GORetriever 上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/11520413/4f0c593bb8c2/btae401f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/11520413/3e99df4ddc5e/btae401f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/11520413/6ea3d0883700/btae401f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/11520413/c47a83ed6b33/btae401f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/11520413/4f0c593bb8c2/btae401f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/11520413/3e99df4ddc5e/btae401f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/11520413/6ea3d0883700/btae401f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/11520413/c47a83ed6b33/btae401f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f03/11520413/4f0c593bb8c2/btae401f4.jpg

相似文献

1
GORetriever: reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation.GORetriever:通过基于文献的深度信息检索对基于蛋白质描述的 GO 候选物进行重新排序,用于蛋白质功能注释。
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii53-ii61. doi: 10.1093/bioinformatics/btae401.
2
Mutual annotation-based prediction of protein domain functions with Domain2GO.基于互注释的蛋白质结构域功能预测与 Domain2GO。
Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.
3
GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank.GOLabeler:通过学习排序提高基于序列的大规模蛋白质功能预测。
Bioinformatics. 2018 Jul 15;34(14):2465-2473. doi: 10.1093/bioinformatics/bty130.
4
Improving protein function prediction using protein sequence and GO-term similarities.利用蛋白质序列和 GO 术语相似性提高蛋白质功能预测。
Bioinformatics. 2019 Apr 1;35(7):1116-1124. doi: 10.1093/bioinformatics/bty751.
5
GRYFUN: a web application for GO term annotation visualization and analysis in protein sets.GRYFUN:一个用于蛋白质组中GO术语注释可视化和分析的网络应用程序。
PLoS One. 2015 Mar 20;10(3):e0119631. doi: 10.1371/journal.pone.0119631. eCollection 2015.
6
Improving automatic GO annotation with semantic similarity.利用语义相似度提高 GO 自动注释的效果。
BMC Bioinformatics. 2022 Dec 12;23(Suppl 2):433. doi: 10.1186/s12859-022-04958-7.
7
Exploiting ontology graph for predicting sparsely annotated gene function.利用本体图预测注释稀疏的基因功能。
Bioinformatics. 2015 Jun 15;31(12):i357-64. doi: 10.1093/bioinformatics/btv260.
8
Information theory applied to the sparse gene ontology annotation network to predict novel gene function.信息论应用于稀疏基因本体注释网络以预测新的基因功能。
Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.
9
Integrating information retrieval with distant supervision for gene ontology annotation.将信息检索与远程监督相结合用于基因本体注释。
Database (Oxford). 2014 Sep 1;2014. doi: 10.1093/database/bau087. Print 2014.
10
Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.基于蛋白质知识的 GO 注释预测的分层深度学习
Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.

引用本文的文献

1
GhostBuster: A Deep-Learning-based, Literature-Unbiased Gene Prioritization Tool for Gene Annotation Prediction.幽灵克星:一种基于深度学习、不受文献偏差影响的用于基因注释预测的基因优先级排序工具。
bioRxiv. 2025 Jun 27:2025.06.22.660948. doi: 10.1101/2025.06.22.660948.
2
GOAnnotator: accurate protein function annotation using automatically retrieved literature.GO注释器:利用自动检索的文献进行准确的蛋白质功能注释。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i410-i419. doi: 10.1093/bioinformatics/btaf199.
3
PLMSearch and PLMAlign: Protein Language Model (PLM)-Based Homologous Protein Sequence Search and Alignment.

本文引用的文献

1
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.将蛋白质序列和结构与转换器和等变图神经网络相结合,以预测蛋白质功能。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325. doi: 10.1093/bioinformatics/btad208.
2
NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations.NetGO 3.0:蛋白质语言模型提高大规模功能注释
Genomics Proteomics Bioinformatics. 2023 Apr;21(2):349-358. doi: 10.1016/j.gpb.2023.04.001. Epub 2023 Apr 17.
3
Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction.
PLMSearch和PLMAlign:基于蛋白质语言模型(PLM)的同源蛋白质序列搜索与比对
Methods Mol Biol. 2025;2941:227-241. doi: 10.1007/978-1-0716-4623-6_14.
4
Learning maximally spanning representations improves protein function annotation.学习最大生成表示可改善蛋白质功能注释。
bioRxiv. 2025 Feb 17:2025.02.13.638156. doi: 10.1101/2025.02.13.638156.
将无监督语言模型与三重态神经网络集成,用于蛋白质基因本体预测。
PLoS Comput Biol. 2022 Dec 22;18(12):e1010793. doi: 10.1371/journal.pcbi.1010793. eCollection 2022 Dec.
4
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.
5
DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.DeepGOZero:基于本体论公理的序列和零样本学习改进蛋白质功能预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245. doi: 10.1093/bioinformatics/btac256.
6
Accurate protein function prediction via graph attention networks with predicted structure information.通过结合预测结构信息的图注意力网络进行准确的蛋白质功能预测。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab502.
7
DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction.DeepGraphGO:用于大规模多物种蛋白质功能预测的图神经网络。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i262-i271. doi: 10.1093/bioinformatics/btab270.
8
Structure-based protein function prediction using graph convolutional networks.基于结构的蛋白质功能预测使用图卷积网络。
Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.
9
NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.NetGO 2.0:利用大规模的序列、文本、结构域、家族和网络信息提高大规模蛋白质功能预测。
Nucleic Acids Res. 2021 Jul 2;49(W1):W469-W475. doi: 10.1093/nar/gkab398.
10
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.