Department of Biotechnology, College of Life Science & Biotechnology, Yonsei University, Seoul, 03722, Korea.
Department of Laboratory Medicine, Research Institute of Bacterial Resistance, Yonsei University College of Medicine, Seoul, 03722, Korea.
Genome Med. 2021 Aug 27;13(1):134. doi: 10.1186/s13073-021-00950-7.
Metagenome sampling bias for geographical location and lifestyle is partially responsible for the incomplete catalog of reference genomes of gut microbial species. Thus, genome assembly from currently under-represented populations may effectively expand the reference gut microbiome and improve taxonomic and functional profiling.
We assembled genomes using public whole-metagenomic shotgun sequencing (WMS) data for 110 and 645 fecal samples from India and Japan, respectively. In addition, we assembled genomes from newly generated WMS data for 90 fecal samples collected from Korea. Expecting genome assembly for low-abundance species may require a much deeper sequencing than that usually employed, so we performed ultra-deep WMS (> 30 Gbp or > 100 million read pairs) for the fecal samples from Korea. We consequently assembled 29,082 prokaryotic genomes from 845 fecal metagenomes for the three under-represented Asian countries and combined them with the Unified Human Gastrointestinal Genome (UHGG) to generate an expanded catalog, the Human Reference Gut Microbiome (HRGM).
HRGM contains 232,098 non-redundant genomes for 5414 representative prokaryotic species including 780 that are novel, > 103 million unique proteins, and > 274 million single-nucleotide variants. This is an over 10% increase from the UHGG. The new 780 species were enriched for the Bacteroidaceae family, including species associated with high-fiber and seaweed-rich diets. Single-nucleotide variant density was positively associated with the speciation rate of gut commensals. We found that ultra-deep sequencing facilitated the assembly of genomes for low-abundance taxa, and deep sequencing (e.g., > 20 million read pairs) may be needed for the profiling of low-abundance taxa. Importantly, the HRGM significantly improved the taxonomic and functional classification of sequencing reads from fecal samples. Finally, analysis of human self-antigen homologs on the HRGM species genomes suggested that bacterial taxa with high cross-reactivity potential may contribute more to the pathogenesis of gut microbiome-associated diseases than those with low cross-reactivity potential by promoting inflammatory condition.
By including gut metagenomes from previously under-represented Asian countries, Korea, India, and Japan, we developed a substantially expanded microbiome catalog, HRGM. Information of the microbial genomes and coding genes is publicly available ( www.mbiomenet.org/HRGM/ ). HRGM will facilitate the identification and functional analysis of disease-associated gut microbiota.
地理位置和生活方式的宏基因组采样偏差部分导致肠道微生物物种的参考基因组目录不完整。因此,对目前代表性不足的人群进行基因组组装可能会有效地扩展参考肠道微生物组,并改善分类和功能分析。
我们使用来自印度和日本的 110 个和 645 个粪便样本的公共全宏基因组鸟枪法测序(WMS)数据进行基因组组装。此外,我们还使用从韩国收集的新生成的 WMS 数据对 90 个粪便样本进行了基因组组装。我们预计对低丰度物种的基因组组装可能需要比通常使用的更深的测序,因此我们对来自韩国的粪便样本进行了超深度 WMS(> 30 Gbp 或> 1000 万对读段)。我们随后从 845 个粪便宏基因组中组装了 29082 个原核生物基因组,用于三个代表性不足的亚洲国家,并将其与统一人类胃肠道基因组(UHGG)相结合,生成了一个扩展的目录,即人类参考肠道微生物组(HRGM)。
HRGM 包含 5414 个代表性原核生物物种的 232098 个非冗余基因组,其中包括 780 个新物种,超过 1.03 亿个独特蛋白质和超过 2.74 亿个单核苷酸变异。这比 UHGG 增加了 10%以上。新的 780 个物种富含拟杆菌科,包括与高纤维和富含海藻饮食相关的物种。单核苷酸变异密度与肠道共生菌的物种形成率呈正相关。我们发现,超深度测序有助于组装低丰度类群的基因组,而深度测序(例如,> 2000 万对读段)可能需要用于低丰度类群的分析。重要的是,HRGM 显著提高了粪便样本测序reads的分类和功能分类。最后,对 HRGM 物种基因组上的人类自身抗原同源物的分析表明,与低交叉反应潜能的细菌类群相比,具有高交叉反应潜能的细菌类群可能通过促进炎症状态而对肠道微生物组相关疾病的发病机制做出更大的贡献。
通过包括来自代表性不足的亚洲国家韩国、印度和日本的肠道宏基因组,我们开发了一个大大扩展的微生物组目录 HRGM。微生物基因组和编码基因的信息可公开获取(www.mbiomenet.org/HRGM/)。HRGM 将促进与疾病相关的肠道微生物组的鉴定和功能分析。