FANTASIA利用语言模型来解码整个动物生命树中的功能性暗蛋白质组。

FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life.

作者信息

Martínez-Redondo Gemma I, Perez-Canales Francisco M, Carbonetto Belén, Fernández José M, Barrios-Núñez Israel, Vázquez-Valls Marçal, Cases Ildefonso, Rojas Ana M, Fernández Rosa

机构信息

Metazoa Phylogenomics and Genome Evolution Lab, Institute of Evolutionary Biology (CSIC-UPF), Barcelona, Spain.

Universitat de Barcelona, Barcelona, Spain.

出版信息

Commun Biol. 2025 Aug 14;8(1):1227. doi: 10.1038/s42003-025-08651-2.

Abstract

Protein functional annotation is crucial in biology, but many protein-coding genes remain uncharacterized, especially in non-model organisms. FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) integrates protein language models for large-scale functional annotation. Applied to ~1000 animal proteomes, FANTASIA predicts functions to virtually all proteins, including up to 50% that remained unannotated by traditional homology-based methods. This enables the discovery of novel gene functions, enhancing our understanding of molecular evolution and organismal biology. FANTASIA holds particular promise for functional discovery in non-model taxa, offering advantages over homology-based tools in sensitivity and generalizability. FANTASIA is available on GitHub at https://github.com/CBBIO/FANTASIA .

摘要

蛋白质功能注释在生物学中至关重要,但许多蛋白质编码基因仍未得到表征,尤其是在非模式生物中。FANTASIA(基于嵌入空间相似性的功能注释)整合蛋白质语言模型以进行大规模功能注释。应用于约1000个动物蛋白质组,FANTASIA可预测几乎所有蛋白质的功能,包括高达50% 未被传统基于同源性的方法注释的蛋白质。这有助于发现新的基因功能,增进我们对分子进化和生物生物学的理解。FANTASIA在非模式分类群的功能发现方面具有特别的前景,在敏感性和通用性方面比基于同源性的工具更具优势。FANTASIA可在GitHub上获取,网址为https://github.com/CBBIO/FANTASIA

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索