Hedmark University College, Hamar, Norway.
J Appl Microbiol. 2013 Jan;114(1):141-51. doi: 10.1111/jam.12035. Epub 2012 Nov 1.
A major challenge in metagenome studies is to estimate the true size of all combined genomes. Here, we present a novel approach to estimate the size of all combined genomes for low coverage next-generation sequencing (NGS) data through empirically determined copy numbers of random DNA fragments.
Size estimates were made based on analyses of two experimental soil micro-ecosystems - simulating soil with and without earthworms. Our analyses showed combined genome sizes of about log 11 nucleotides for each of the soil micro-ecosystems, as estimated from qPCR determined copy numbers of random DNA fragments. This corresponds to more than 20000 unique bacterial genomes in each sample. There seemed, however, to be a bacterial subpopulation in the earthworm soil, not being present in the nonearthworm soil. To describe the structure of the metagenomes, both total DNA and amplified 16S rRNA gene sequence libraries were generated with 454-sequencing. Bioinformatic analysis of 454 sequence libraries showed a large functional but low taxonomic overlap between the samples with and without earthworms. A neutrality test indicated that rare species have a competitive advantage over abundant species in both micro-ecosystems providing a potential explanation for the large metagenome sizes.
We have shown that the soil metagenome is very large and that the large size is probably a consequence of top-down selection of the dominant bacterial species.
Estimates of metagenome size from low coverage NGS data will be important for guiding future NGS set-ups.
宏基因组研究的一个主要挑战是估计所有组合基因组的真实大小。在这里,我们通过经验确定随机 DNA 片段的拷贝数,提出了一种新的方法来估计低覆盖度下一代测序(NGS)数据中所有组合基因组的大小。
基于模拟有和没有蚯蚓的土壤微生态系统的两项实验,我们根据分析得出了大小估计值。我们的分析表明,根据随机 DNA 片段的 qPCR 确定的拷贝数,每个土壤微生态系统的组合基因组大小约为 11 个核苷酸对数。这相当于每个样本中有超过 20000 个独特的细菌基因组。然而,在蚯蚓土壤中似乎存在一个细菌亚群,而在没有蚯蚓的土壤中不存在。为了描述宏基因组的结构,我们使用 454 测序生成了总 DNA 和扩增 16S rRNA 基因序列文库。454 序列文库的生物信息学分析显示,有和没有蚯蚓的样本之间存在很大的功能重叠,但分类学重叠很小。中立性检验表明,稀有物种在两个微生态系统中都比丰富物种具有竞争优势,这为宏基因组的巨大大小提供了一个潜在的解释。
我们已经表明,土壤宏基因组非常大,而且这种大规模可能是主要细菌物种自上而下选择的结果。
从低覆盖 NGS 数据估计宏基因组的大小对于指导未来的 NGS 设置将非常重要。