National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Nucleic Acids Res. 2021 Jan 8;49(D1):D274-D281. doi: 10.1093/nar/gkaa1018.
The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus. In addition, the current version of the COGs includes the following new features: (i) the recently deprecated NCBI's gene index (gi) numbers for the encoded proteins are replaced with stable RefSeq or GenBank\ENA\DDBJ coding sequence (CDS) accession numbers; (ii) COG annotations are updated for >200 newly characterized protein families with corresponding references and PDB links, where available; (iii) lists of COGs grouped by pathways and functional systems are added; (iv) 266 new COGs for proteins involved in CRISPR-Cas immunity, sporulation in Firmicutes and photosynthesis in cyanobacteria are included; and (v) the database is made available as a web page, in addition to FTP. The current release includes 4877 COGs. Future plans include further expansion of the COG collection by adding archaeal COGs (arCOGs), splitting the COGs containing multiple paralogs, and continued refinement of COG annotations.
同源基因簇 (COG) 数据库,也称为同源蛋白簇,创建于 1997 年,经历了几次更新,最近一次是在 2014 年。目前的更新版本可在 https://www.ncbi.nlm.nih.gov/research/COG 上获取,该版本极大地扩展了数据库的范围,包括 1187 种细菌和 122 种古菌的完整基因组,通常每个属只有一个基因组。此外,当前版本的 COG 具有以下新功能:(i) 最近弃用的 NCBI 的基因索引 (gi) 编号被替换为稳定的 RefSeq 或 GenBank\ENA\DDBJ 编码序列 (CDS) 访问号;(ii) 更新了 >200 个新特征化蛋白家族的 COG 注释,并提供了相应的参考文献和 PDB 链接(如果有);(iii) 添加了按途径和功能系统分组的 COG 列表;(iv) 包含 266 个新的 COG,用于参与 CRISPR-Cas 免疫、厚壁菌门孢子形成和蓝细菌光合作用的蛋白质;(v) 除了 FTP 之外,还将数据库作为网页提供。当前版本包括 4877 个 COG。未来的计划包括进一步扩展 COG 集合,增加古菌 COG(arCOG),拆分包含多个直系同源物的 COG,并继续改进 COG 注释。