Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg 199034, Russia.
Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg 199034, Russia; Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA.
Cell Syst. 2018 Aug 22;7(2):192-200.e3. doi: 10.1016/j.cels.2018.06.009. Epub 2018 Jul 25.
Reduced microbiome diversity has been linked to several diseases. However, estimating the diversity of bacterial communities-the number and the total length of distinct genomes within a metagenome-remains an open problem in microbial ecology. Here, we describe an algorithm for estimating the microbial diversity in a metagenomic sample based on a joint analysis of short and long reads. Unlike previous approaches, the algorithm does not make any assumptions on the distribution of the frequencies of genomes within a metagenome (as in parametric methods) and does not require a large database that covers the total diversity (as in non-parametric methods). We estimate that genomes comprising a human gut metagenome have total length varying from 1.3 to 3.5 billion nucleotides, with genomes responsible for 50% of total abundance having total length varying from only 25 to 61 million nucleotides. In contrast, genomes comprising an aquifer sediment metagenome have more than two orders of magnitude larger total length (≈840 billion nucleotides).
微生物群落的多样性与多种疾病有关。然而,估计细菌群落(宏基因组中不同基因组的数量和总长度)的多样性仍然是微生物生态学中的一个未解决的问题。在这里,我们描述了一种基于短读长和长读长联合分析来估计宏基因组样本中微生物多样性的算法。与以前的方法不同,该算法不假设宏基因组中基因组频率的分布(如参数方法),也不需要涵盖总多样性的大型数据库(如非参数方法)。我们估计人类肠道宏基因组中的基因组总长度从 13 亿到 35 亿个核苷酸不等,负责总丰度 50%的基因组总长度仅从 2500 万到 6100 万核苷酸不等。相比之下,含水层沉积物宏基因组中的基因组总长度大两个数量级以上(约 8400 亿个核苷酸)。