Mora-Márquez Fernando, Hurtado Mikel, López de Heredia Unai
GI en Desarrollo de Especies y Comunidades Leñosas (WooSP), Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, José Antonio Novais 10, Madrid 28040, Spain.
Database (Oxford). 2025 Mar 5;2025. doi: 10.1093/database/baaf019.
Gymnosperms are a clade of non-flowering plants that include about 1000 living species. Due to their complex genomes and lack of genomic resources, functional annotation in genomics and transcriptomics on gymnosperms suffers from limitations. Here we present gymnotoa-db, which is a novel, publicly accessible relational database designed to facilitate functional annotation in gymnosperms. This database stores non-redundant records of gymnosperm proteins, encompassing taxonomic and functional information. The complementary software, gymnotoa-app, enables users to download gymnotoa-db and execute a comprehensive functional annotation pipeline for high-throughput sequencing-derived DNA or cDNA sequences. gymnotoa-app's user-friendly interface and efficient algorithms streamline the functional annotation process, making it an invaluable tool for researchers studying gymnosperms. We compared gymnotoa-app's performance against other annotation tools utilizing disparate reference databases. Our results demonstrate gymnotoa-app's superior ability to accurately annotate gymnosperm transcripts, recovering a greater number of transcripts and unique, non-redundant Gene Ontology terms. gymnotoa-db's distinctive features include comprehensive coverage with a non-redundant dataset of gymnosperm protein sequences, robust functional information that integrates data from multiple ontology systems, including GO, KEGG, EC, and MetaCYC, while keeping the taxonomic context, including Arabidopsis homologs. Database URL: https://blogs.upm.es/gymnotoa-db/2024/09/19/gymnotoa-app/.
裸子植物是一类不开花的植物,包括约1000个现存物种。由于其基因组复杂且缺乏基因组资源,裸子植物基因组学和转录组学中的功能注释存在局限性。在此,我们展示了gymnotoa-db,这是一个新颖的、可公开访问的关系数据库,旨在促进裸子植物的功能注释。该数据库存储裸子植物蛋白质的非冗余记录,涵盖分类学和功能信息。配套软件gymnotoa-app使用户能够下载gymnotoa-db,并对高通量测序获得的DNA或cDNA序列执行全面的功能注释流程。gymnotoa-app用户友好的界面和高效的算法简化了功能注释过程,使其成为研究裸子植物的研究人员的宝贵工具。我们将gymnotoa-app与使用不同参考数据库的其他注释工具的性能进行了比较。我们的结果表明,gymnotoa-app在准确注释裸子植物转录本方面具有卓越能力,能够找回更多的转录本以及独特的、非冗余的基因本体术语。gymnotoa-db的独特功能包括使用裸子植物蛋白质序列的非冗余数据集进行全面覆盖、整合来自多个本体系统(包括GO、KEGG、EC和MetaCYC)的数据的强大功能信息,同时保留分类学背景,包括拟南芥同源物。数据库网址:https://blogs.upm.es/gymnotoa-db/2024/09/19/gymnotoa-app/ 。