Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.
Biobyte solutions GmbH, Bothestr. 142, 69117 Heidelberg, Germany.
Nucleic Acids Res. 2023 Jan 6;51(D1):D760-D766. doi: 10.1093/nar/gkac1078.
The interpretation of genomic, transcriptomic and other microbial 'omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/.
基因组、转录组和其他微生物组学数据的解释高度依赖于注释良好的基因组的可用性。随着公开可用的微生物基因组数量呈指数级增长,对质量控制和一致注释的需求变得至关重要。我们提供了 proGenomes3 数据库,其中包含 907388 个高质量基因组,这些基因组包含 40 亿个通过严格标准的基因,并使用包括移动遗传元件和生物合成基因簇在内的多个功能和分类数据库进行了一致的注释。proGenomes3 包含 41171 个基于通用单拷贝标记基因定义的种级别的聚类,为其提供了泛基因组和上下文生境注释。该数据库可在 http://progenomes.embl.de/ 获得。