Yaddehige Sachithra Kalhari, Vischioni Chiara, Berselli Michele, Alberghini Leonardo, Mezzavilla Massimo, Bobbo Tania, Taccioli Cristian
Department of Animal Medicine, Production and Health, University of Padova, Padua, Italy.
CNRS, INSERM, IRCAN, Côte D'Azur University, Nice, France.
Mol Biol Evol. 2025 Jun 4;42(6). doi: 10.1093/molbev/msaf114.
Evolutionary studies require extensive examination of genomic information across all domains of life. Despite the availability of a large number of genomes through GenBank, the effective visualization or comparison of the information they contain is challenging due to many reasons, including their size. We introduce genome-based retrieval and analysis parser, a comprehensive software tool to analyze genome files, and an online database housing an extensive collection of carefully curated, high-quality genome statistics for all the organisms available in the RefSeq database of National Center for Biotechnology Information. Users can either directly search, or select from precategorized groups, the organisms of their choice and retrieve data, and the output is generated as tables containing more than 200 columns of useful genomic information (base counts, GC content, Shannon entropy, codon usage, etc.) separately calculated for different genomic elements (e.g. coding sequences, introns, transfer RNA, ribosomal RNA, noncoding RNA, etc.). The data are independently displayed (if applicable) for each chromosomal, mitochondrial, plastid, or plasmid sequence. All the data can be visualized on the database or downloaded as comma-separated value or Excel files. The genome-based retrieval and analysis parser database is free to access without any registration and is publicly available at http://tacclab.org/gbrap/.
进化研究需要对生命所有领域的基因组信息进行广泛的审视。尽管通过GenBank可获取大量基因组,但由于诸多原因,包括其规模,要有效可视化或比较它们所包含的信息仍具有挑战性。我们推出了基于基因组的检索与分析解析器,这是一个用于分析基因组文件的综合软件工具,以及一个在线数据库,该数据库收纳了大量精心整理的高质量基因组统计数据,涵盖了美国国立生物技术信息中心RefSeq数据库中所有可用生物的信息。用户既可以直接搜索,也可以从预先分类的组中选择自己感兴趣的生物并检索数据,输出结果以表格形式呈现,其中包含分别针对不同基因组元件(如编码序列、内含子、转运RNA、核糖体RNA、非编码RNA等)单独计算的200多列有用的基因组信息(碱基计数、GC含量、香农熵、密码子使用情况等)。这些数据会针对每个染色体、线粒体、质体或质粒序列独立显示(如适用)。所有数据都可以在数据库中可视化,也可以下载为逗号分隔值文件或Excel文件。基于基因组的检索与分析解析器数据库无需注册即可免费访问,可通过http://tacclab.org/gbrap/公开获取。