Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
Mol Ecol Resour. 2018 Nov;18(6):1339-1355. doi: 10.1111/1755-0998.12923. Epub 2018 Jul 27.
The first step of any metagenome sequencing project is to get the inventory of OTU abundances (operational taxonomic units) and/or metagenomic gene abundances. The former is generated with 16S-rRNA-tagged amplicon sequencing technology, and the latter can be generated from either gene-targeted or whole-sample shotgun metagenomics technologies. With 16S-rRNA data sets, measuring community diversity with diversity indexes such as species richness and Shannon's index has been a de facto standard analysis; nevertheless, similarly comprehensive approaches to metagenomic gene abundances are still largely missing, despite that both OTU and gene abundances are DNA reads. Here, we adapt the Hill numbers, which were reintroduced to macrocommunity ecology recently and are now widely regarded as a most appropriate measure system for ecological diversity, for measuring metagenome alpha-, beta- and gamma-diversities, and similarity. Our proposal includes the following: (a) Metagenomic gene (MG) diversity measures the single-gene-level metagenome diversity; (b) Type-I metagenome functional gene cluster (MFGC) diversity measures the diversity of functional gene clusters but ignoring within-cluster gene abundance information; (c) Type-II MFGC diversity considers within-cluster gene abundances information and integrates gene-cluster-level metagenome diversity and functional gene redundancy information; and (d) Four classes of Hill-numbers-based similarity metrics, including local gene overlap, regional gene overlap, gene homogeneity measure and gene turnover complement, were introduced in terms of MG and MFGC, respectively. We demonstrate the proposal with the gut metagenomes from healthy and IBD (inflammatory bowel disease) cohorts. The Hill numbers offer a unified approach to cohesively and comprehensively measuring the ecological and metagenome diversities of microbiomes.
任何宏基因组测序项目的第一步是获得 OTU 丰度(分类单元)和/或宏基因组基因丰度的清单。前者是通过 16S-rRNA 标记扩增子测序技术生成的,后者可以通过靶向基因或全样本鸟枪法宏基因组学技术生成。对于 16S-rRNA 数据集,使用多样性指数(如物种丰富度和 Shannon 指数)来衡量群落多样性已成为事实上的标准分析方法;然而,尽管 OTU 和基因丰度都是 DNA 读数,但对于宏基因组基因丰度的类似全面方法仍然在很大程度上缺失。在这里,我们采用了 Hill 数,它最近被重新引入到宏观生态学中,现在被广泛认为是生态多样性的最合适度量系统,用于测量宏基因组的 alpha、beta 和 gamma 多样性以及相似性。我们的建议包括:(a) 宏基因组基因(MG)多样性衡量单基因水平的宏基因组多样性;(b) 第一型宏基因组功能基因聚类(MFGC)多样性衡量功能基因聚类的多样性,但忽略了聚类内基因丰度信息;(c) 第二型 MFGC 多样性考虑了聚类内基因丰度信息,并整合了基因聚类水平的宏基因组多样性和功能基因冗余信息;(d) 基于 Hill 数的四类相似性度量指标,包括局部基因重叠、区域基因重叠、基因同质性度量和基因周转率互补,分别在 MG 和 MFGC 中引入。我们用健康和 IBD(炎症性肠病)队列的肠道宏基因组来演示这个建议。Hill 数为一致且全面地测量微生物组的生态和宏基因组多样性提供了一种统一的方法。