Suppr超能文献

GO注释器:利用自动检索的文献进行准确的蛋白质功能注释。

GOAnnotator: accurate protein function annotation using automatically retrieved literature.

作者信息

Yan Huiying, Liu Hancheng, Wang Shaojun, Zhu Shanfeng

机构信息

Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China.

Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, 200433, China.

出版信息

Bioinformatics. 2025 Jul 1;41(Supplement_1):i410-i419. doi: 10.1093/bioinformatics/btaf199.

Abstract

SUMMARY

Automated protein function prediction/annotation (AFP) is vital for understanding biological processes and advancing biomedical research. Existing text-based AFP methods including the state-of-the-art method, GORetriever, rely on expert-curated relevant literature, which is costly and time-consuming, and cover only a small portion of the proteins in UniProt. To overcome this limitation, we propose GOAnnotator, a novel framework for automated protein function annotation. It consists of two key modules: PubRetriever, a hybrid system for retrieving and re-ranking relevant literature, and GORetriever+, an enhanced module for identifying Gene Ontology (GO) terms from the retrieved texts. Extensive experiments over three benchmark datasets demonstrate that GOAnnotator delivers high-quality functional annotations, surpassing GORetriever in realistic situations by uncovering unique literature and predicting additional functions. These results highlight its great potential to streamline and enhance annotation of protein functions without relying on manual curation.

AVAILABILITY AND IMPLEMENTATION

The code and data are available at https://github.com/ZhuLab-Fudan/GOAnnotator.

摘要

摘要

自动蛋白质功能预测/注释(AFP)对于理解生物过程和推进生物医学研究至关重要。现有的基于文本的AFP方法,包括最先进的方法GORetriever,依赖于专家整理的相关文献,这既昂贵又耗时,并且仅涵盖了UniProt中一小部分蛋白质。为了克服这一局限性,我们提出了GOAnnotator,这是一种用于自动蛋白质功能注释的新颖框架。它由两个关键模块组成:PubRetriever,一个用于检索和重新排序相关文献的混合系统;以及GORetriever+,一个用于从检索到的文本中识别基因本体(GO)术语的增强模块。在三个基准数据集上进行的大量实验表明,GOAnnotator能够提供高质量的功能注释,通过发现独特的文献和预测额外的功能,在实际情况下超越了GORetriever。这些结果突出了其在不依赖人工整理的情况下简化和增强蛋白质功能注释的巨大潜力。

可用性和实现方式

代码和数据可在https://github.com/ZhuLab-Fudan/GOAnnotator获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e94/12261426/6c1ee4e038f9/btaf199f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验