National Heart and Lung Institute, Imperial College London, London, UK.
Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
Methods Mol Biol. 2023;2649:55-67. doi: 10.1007/978-1-0716-3072-3_3.
The booming sequencing technologies have turned metagenomics into a widely used tool for microbe-related studies, especially in the areas of clinical medicine and ecology. Accordingly, the toolkit of metagenomics data analysis is growing stronger to provide multiple approaches for solving various biological questions and understanding the component and function of microbiome. As part of the toolkit, metagenomics databases play a central role in the creation and maintenance of processed data such as definition of taxonomic classifications, annotation of gene functions, sequence alignment, and phylogenetic tree inference. The availability of a large quantity of high-quality bacterial genomic sequences contributes significantly to the construction and update of metagenomics databases, which constitute the core resource for metagenomics data analysis at various scales. This chapter presents the key concepts, technical options, and challenges for metagenomics projects as well as the curation processes and versatile functions for the four representative bacterial metagenomics databases, including Greengenes, SILVA, Ribosomal Database Project (RDP), and Genome Taxonomy Database (GTDB).
蓬勃发展的测序技术使宏基因组学成为微生物相关研究的常用工具,尤其在临床医学和生态学领域。因此,宏基因组数据分析工具包不断壮大,为解决各种生物学问题和理解微生物组的组成和功能提供了多种方法。作为工具包的一部分,宏基因组数据库在创建和维护经过处理的数据(例如分类学定义、基因功能注释、序列比对和系统发育树推断)方面发挥着核心作用。大量高质量细菌基因组序列的可用性极大地促进了宏基因组数据库的构建和更新,这些数据库是各种规模宏基因组数据分析的核心资源。本章介绍了宏基因组项目的关键概念、技术选择和挑战,以及四个有代表性的细菌宏基因组数据库(Greengenes、SILVA、核糖体数据库项目[RDP]和基因组分类数据库[GTDB])的编目过程和多功能性。