Tatusov R L, Galperin M Y, Natale D A, Koonin E V
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Nucleic Acids Res. 2000 Jan 1;28(1):33-6. doi: 10.1093/nar/28.1.33.
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.
对测序基因组中编码的蛋白质进行合理分类,对于使基因组序列在功能和进化研究中发挥最大作用至关重要。直系同源蛋白簇数据库(COGs)是对细菌、古菌和真核生物的21个完整基因组中编码的蛋白质进行系统发育分类的一次尝试(http://www.ncbi.nlm.nih.gov/COG)。COGs是通过将基因组特异性最佳匹配标准应用于这些基因组中所有蛋白质序列的详尽比较结果而构建的。该数据库包含2091个COGs,其中包括每个完整细菌和古菌基因组中56 - 83%的基因产物,以及酵母酿酒酵母基因组中约35%的基因产物。COG数据库还附带了COGNITOR程序,该程序用于将新蛋白质归入COGs,可应用于新测序基因组的功能和系统发育注释。