Becraft Eric D, Dodsworth Jeremy A, Murugapiran Senthil K, Ohlsson J Ingemar, Briggs Brandon R, Kanbar Jad, De Vlaminck Iwijn, Quake Stephen R, Dong Hailiang, Hedlund Brian P, Swingley Wesley D
Department of Biological Sciences, Northern Illinois University, DeKalb, Illinois, USA.
Bigelow Laboratory for Ocean Sciences, East Boothbay, Maine, USA.
Appl Environ Microbiol. 2015 Dec 4;82(4):992-1003. doi: 10.1128/AEM.03140-15. Print 2016 Feb 15.
The vast majority of microbial life remains uncatalogued due to the inability to cultivate these organisms in the laboratory. This "microbial dark matter" represents a substantial portion of the tree of life and of the populations that contribute to chemical cycling in many ecosystems. In this work, we leveraged an existing single-cell genomic data set representing the candidate bacterial phylum "Calescamantes" (EM19) to calibrate machine learning algorithms and define metagenomic bins directly from pyrosequencing reads derived from Great Boiling Spring in the U.S. Great Basin. Compared to other assembly-based methods, taxonomic binning with a read-based machine learning approach yielded final assemblies with the highest predicted genome completeness of any method tested. Read-first binning subsequently was used to extract Calescamantes bins from all metagenomes with abundant Calescamantes populations, including metagenomes from Octopus Spring and Bison Pool in Yellowstone National Park and Gongxiaoshe Spring in Yunnan Province, China. Metabolic reconstruction suggests that Calescamantes are heterotrophic, facultative anaerobes, which can utilize oxidized nitrogen sources as terminal electron acceptors for respiration in the absence of oxygen and use proteins as their primary carbon source. Despite their phylogenetic divergence, the geographically separate Calescamantes populations were highly similar in their predicted metabolic capabilities and core gene content, respiring O2, or oxidized nitrogen species for energy conservation in distant but chemically similar hot springs.
由于无法在实验室中培养这些微生物,绝大多数微生物仍未被分类编目。这种“微生物暗物质”在生命之树以及许多生态系统中参与化学循环的种群中占了很大一部分。在这项研究中,我们利用了一个现有的单细胞基因组数据集,该数据集代表候选细菌门“Calescamantes”(EM19),来校准机器学习算法,并直接从美国大盆地大沸腾泉的焦磷酸测序读数中定义宏基因组 bins。与其他基于组装的方法相比,使用基于读数的机器学习方法进行分类分箱得到的最终组装结果,在所有测试方法中预测的基因组完整性最高。随后,基于读数的优先分箱法被用于从所有含有丰富Calescamantes种群的宏基因组中提取Calescamantes bins,这些宏基因组包括黄石国家公园章鱼泉和野牛池以及中国云南省公肖舍泉的宏基因组。代谢重建表明,Calescamantes是异养兼性厌氧菌,在无氧条件下,它们可以利用氧化态氮源作为呼吸作用的终端电子受体,并以蛋白质作为主要碳源。尽管它们在系统发育上存在差异,但地理上分离的Calescamantes种群在预测的代谢能力和核心基因含量方面高度相似,在遥远但化学性质相似的温泉中通过呼吸氧气或氧化态氮物种来保存能量。