Suppr超能文献

COG数据库中微生物基因组覆盖范围的扩大及蛋白质家族注释的改进。

Expanded microbial genome coverage and improved protein family annotation in the COG database.

作者信息

Galperin Michael Y, Makarova Kira S, Wolf Yuri I, Koonin Eugene V

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 2094, USA.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 2094, USA

出版信息

Nucleic Acids Res. 2015 Jan;43(Database issue):D261-9. doi: 10.1093/nar/gku1223. Epub 2014 Nov 26.

Abstract

Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics.

摘要

微生物基因组测序项目产生了大量推导蛋白质序列,其中只有一小部分已经或将会进行实验研究。这使得序列分析成为注释这些蛋白质并赋予其暂定功能的唯一可行方法。蛋白质直系同源簇(COG)数据库(http://www.ncbi.nlm.nih.gov/COG/)于1997年首次创建,一直是功能注释的常用工具。它的成功主要基于以下几点:(i)依赖完整的微生物基因组,这使得大多数基因的直系同源物和旁系同源物能够可靠地分配;(ii)基于直系同源性的方法,该方法利用蛋白质家族(COG)中已表征成员的功能来为精心鉴定的直系同源物全集分配功能,并在有多个功能时描述潜在功能范围;(iii)对COG注释进行仔细的人工整理,旨在详细预测每个COG的生物学功能,同时避免注释错误和过度预测。在此,我们展示了自2003年以来COG的首次更新,以及对COG注释的全面修订和基因组覆盖范围的扩展,以纳入所有细菌和古菌谱系直至属水平的代表性完整基因组。对COG的重新分析表明,最初的COG分配错误率低于0.5%,并能够评估过去12年功能基因组学的进展。在此期间,许多先前未表征的COG的功能已被阐明,许多COG的暂定功能分配已通过靶向实验或高通量方法得到验证。一个特别重要的进展是为几种广泛存在的保守蛋白质赋予了功能,其中许多蛋白质参与翻译,特别是rRNA成熟和tRNA修饰。新版本的COG有望成为微生物基因组学的重要工具。

相似文献

1
Expanded microbial genome coverage and improved protein family annotation in the COG database.
Nucleic Acids Res. 2015 Jan;43(Database issue):D261-9. doi: 10.1093/nar/gku1223. Epub 2014 Nov 26.
2
COG database update: focus on microbial diversity, model organisms, and widespread pathogens.
Nucleic Acids Res. 2021 Jan 8;49(D1):D274-D281. doi: 10.1093/nar/gkaa1018.
4
Microbial genome analysis: the COG approach.
Brief Bioinform. 2019 Jul 19;20(4):1063-1070. doi: 10.1093/bib/bbx117.
5
The COG database: a tool for genome-scale analysis of protein functions and evolution.
Nucleic Acids Res. 2000 Jan 1;28(1):33-6. doi: 10.1093/nar/28.1.33.
7
A genomic perspective on protein families.
Science. 1997 Oct 24;278(5338):631-7. doi: 10.1126/science.278.5338.631.
8
The COG database: an updated version includes eukaryotes.
BMC Bioinformatics. 2003 Sep 11;4:41. doi: 10.1186/1471-2105-4-41.
10
TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes.
Nucleic Acids Res. 2007 Jan;35(Database issue):D260-4. doi: 10.1093/nar/gkl1043. Epub 2006 Dec 6.

引用本文的文献

1
Genomic Insights into Emerging Multidrug-Resistant Strains: First Report from Thailand.
Antibiotics (Basel). 2025 Jul 24;14(8):746. doi: 10.3390/antibiotics14080746.
2
Bringing the uncultivated microbial majority of freshwater ecosystems into culture.
Nat Commun. 2025 Aug 26;16(1):7971. doi: 10.1038/s41467-025-63266-9.
4
5
Genome assembly at the chromosome level of Clinopodium barosmum.
Sci Data. 2025 Aug 12;12(1):1406. doi: 10.1038/s41597-025-05784-1.
8
Molecular characteristics and antimicrobial susceptibility of carbapenem-resistant in a multicenter study in Ningbo, China.
Front Microbiol. 2025 Jul 21;16:1628592. doi: 10.3389/fmicb.2025.1628592. eCollection 2025.
9
Identification and Pathogenicity Analysis of Qf-1 in Mink ().
Microorganisms. 2025 Jul 8;13(7):1604. doi: 10.3390/microorganisms13071604.

本文引用的文献

1
ε, a new subunit of RNA polymerase found in gram-positive bacteria.
J Bacteriol. 2014 Oct;196(20):3622-32. doi: 10.1128/JB.02020-14. Epub 2014 Aug 4.
2
Recent advances in radical SAM enzymology: new structures and mechanisms.
ACS Chem Biol. 2014 Sep 19;9(9):1929-38. doi: 10.1021/cb5004674. Epub 2014 Jul 16.
3
Elongated structure of the outer-membrane activator of peptidoglycan synthesis LpoA: implications for PBP1A stimulation.
Structure. 2014 Jul 8;22(7):1047-54. doi: 10.1016/j.str.2014.04.017. Epub 2014 Jun 19.
4
Protein domains of unknown function are essential in bacteria.
mBio. 2013 Dec 31;5(1):e00744-13. doi: 10.1128/mBio.00744-13.
5
MMDB and VAST+: tracking structural similarities between macromolecular complexes.
Nucleic Acids Res. 2014 Jan;42(Database issue):D297-303. doi: 10.1093/nar/gkt1208. Epub 2013 Dec 6.
6
RefSeq microbial genomes database: new representation and annotation strategy.
Nucleic Acids Res. 2014 Jan;42(Database issue):D553-9. doi: 10.1093/nar/gkt1274. Epub 2013 Dec 6.
7
eggNOG v4.0: nested orthology inference across 3686 organisms.
Nucleic Acids Res. 2014 Jan;42(Database issue):D231-9. doi: 10.1093/nar/gkt1253. Epub 2013 Dec 1.
8
The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).
Nucleic Acids Res. 2014 Jan;42(Database issue):D206-14. doi: 10.1093/nar/gkt1226. Epub 2013 Nov 29.
9
Pfam: the protein families database.
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.
10
CyanoBase and RhizoBase: databases of manually curated annotations for cyanobacterial and rhizobial genomes.
Nucleic Acids Res. 2014 Jan;42(Database issue):D666-70. doi: 10.1093/nar/gkt1145. Epub 2013 Nov 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验