Yan Huiying, Liu Hancheng, Wang Shaojun, Zhu Shanfeng
Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China.
Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, 200433, China.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i410-i419. doi: 10.1093/bioinformatics/btaf199.
Automated protein function prediction/annotation (AFP) is vital for understanding biological processes and advancing biomedical research. Existing text-based AFP methods including the state-of-the-art method, GORetriever, rely on expert-curated relevant literature, which is costly and time-consuming, and cover only a small portion of the proteins in UniProt. To overcome this limitation, we propose GOAnnotator, a novel framework for automated protein function annotation. It consists of two key modules: PubRetriever, a hybrid system for retrieving and re-ranking relevant literature, and GORetriever+, an enhanced module for identifying Gene Ontology (GO) terms from the retrieved texts. Extensive experiments over three benchmark datasets demonstrate that GOAnnotator delivers high-quality functional annotations, surpassing GORetriever in realistic situations by uncovering unique literature and predicting additional functions. These results highlight its great potential to streamline and enhance annotation of protein functions without relying on manual curation.
The code and data are available at https://github.com/ZhuLab-Fudan/GOAnnotator.
自动蛋白质功能预测/注释(AFP)对于理解生物过程和推进生物医学研究至关重要。现有的基于文本的AFP方法,包括最先进的方法GORetriever,依赖于专家整理的相关文献,这既昂贵又耗时,并且仅涵盖了UniProt中一小部分蛋白质。为了克服这一局限性,我们提出了GOAnnotator,这是一种用于自动蛋白质功能注释的新颖框架。它由两个关键模块组成:PubRetriever,一个用于检索和重新排序相关文献的混合系统;以及GORetriever+,一个用于从检索到的文本中识别基因本体(GO)术语的增强模块。在三个基准数据集上进行的大量实验表明,GOAnnotator能够提供高质量的功能注释,通过发现独特的文献和预测额外的功能,在实际情况下超越了GORetriever。这些结果突出了其在不依赖人工整理的情况下简化和增强蛋白质功能注释的巨大潜力。