European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
Nat Biotechnol. 2021 Jan;39(1):105-114. doi: 10.1038/s41587-020-0603-3. Epub 2020 Jul 20.
Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.
全面、高质量的参考基因组对于人类肠道微生物群的功能特征和分类学分配至关重要。我们提出了统一的人类胃肠道基因组 (UHGG) 集合,其中包含来自 4644 个肠道原核生物的 204938 个非冗余基因组。这些基因组编码超过 1.7 亿个蛋白质序列,我们在统一的人类胃肠道蛋白质 (UHGP) 目录中进行了整理。与整合基因目录中的蛋白质相比,UHGP 增加了两倍以上的肠道蛋白质数量。超过 70%的 UHGG 物种缺乏培养代表,40%的 UHGP 缺乏功能注释。种内基因组变异分析显示,大量的附属基因和单核苷酸变异,其中许多是特定于个体人类群体的。UHGG 和 UHGP 集合将使研究能够将基因型与人类肠道微生物组中的表型联系起来。