FANTASIA利用语言模型来解码整个动物生命树中的功能性暗蛋白质组。
FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life.
作者信息
Martínez-Redondo Gemma I, Perez-Canales Francisco M, Carbonetto Belén, Fernández José M, Barrios-Núñez Israel, Vázquez-Valls Marçal, Cases Ildefonso, Rojas Ana M, Fernández Rosa
机构信息
Metazoa Phylogenomics and Genome Evolution Lab, Institute of Evolutionary Biology (CSIC-UPF), Barcelona, Spain.
Universitat de Barcelona, Barcelona, Spain.
出版信息
Commun Biol. 2025 Aug 14;8(1):1227. doi: 10.1038/s42003-025-08651-2.
Protein functional annotation is crucial in biology, but many protein-coding genes remain uncharacterized, especially in non-model organisms. FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) integrates protein language models for large-scale functional annotation. Applied to ~1000 animal proteomes, FANTASIA predicts functions to virtually all proteins, including up to 50% that remained unannotated by traditional homology-based methods. This enables the discovery of novel gene functions, enhancing our understanding of molecular evolution and organismal biology. FANTASIA holds particular promise for functional discovery in non-model taxa, offering advantages over homology-based tools in sensitivity and generalizability. FANTASIA is available on GitHub at https://github.com/CBBIO/FANTASIA .
蛋白质功能注释在生物学中至关重要,但许多蛋白质编码基因仍未得到表征,尤其是在非模式生物中。FANTASIA(基于嵌入空间相似性的功能注释)整合蛋白质语言模型以进行大规模功能注释。应用于约1000个动物蛋白质组,FANTASIA可预测几乎所有蛋白质的功能,包括高达50% 未被传统基于同源性的方法注释的蛋白质。这有助于发现新的基因功能,增进我们对分子进化和生物生物学的理解。FANTASIA在非模式分类群的功能发现方面具有特别的前景,在敏感性和通用性方面比基于同源性的工具更具优势。FANTASIA可在GitHub上获取,网址为https://github.com/CBBIO/FANTASIA 。