Chivian Dylan, Jungbluth Sean P, Dehal Paramvir S, Wood-Charlson Elisha M, Canon Richard S, Allen Benjamin H, Clark Mikayla M, Gu Tianhao, Land Miriam L, Price Gavin A, Riehl William J, Sneddon Michael W, Sutormin Roman, Zhang Qizhi, Cottingham Robert W, Henry Chris S, Arkin Adam P
Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Oak Ridge National Laboratory, Oak Ridge, TN, USA.
Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x. Epub 2022 Nov 14.
Uncultivated Bacteria and Archaea account for the vast majority of species on Earth, but obtaining their genomes directly from the environment, using shotgun sequencing, has only become possible recently. To realize the hope of capturing Earth's microbial genetic complement and to facilitate the investigation of the functional roles of specific lineages in a given ecosystem, technologies that accelerate the recovery of high-quality genomes are necessary. We present a series of analysis steps and data products for the extraction of high-quality metagenome-assembled genomes (MAGs) from microbiomes using the U.S. Department of Energy Systems Biology Knowledgebase (KBase) platform ( http://www.kbase.us/ ). Overall, these steps take about a day to obtain extracted genomes when starting from smaller environmental shotgun read libraries, or up to about a week from larger libraries. In KBase, the process is end-to-end, allowing a user to go from the initial sequencing reads all the way through to MAGs, which can then be analyzed with other KBase capabilities such as phylogenetic placement, functional assignment, metabolic modeling, pangenome functional profiling, RNA-Seq and others. While portions of such capabilities are available individually from other resources, the combination of the intuitive usability, data interoperability and integration of tools in a freely available computational resource makes KBase a powerful platform for obtaining MAGs from microbiomes. While this workflow offers tools for each of the key steps in the genome extraction process, it also provides a scaffold that can be easily extended with additional MAG recovery and analysis tools, via the KBase software development kit (SDK).
未培养的细菌和古菌占地球上物种的绝大多数,但直到最近,利用鸟枪法测序直接从环境中获取它们的基因组才成为可能。为了实现获取地球微生物基因库的愿望,并促进对特定谱系在给定生态系统中功能作用的研究,加速高质量基因组恢复的技术是必要的。我们展示了一系列分析步骤和数据产品,用于使用美国能源部系统生物学知识库(KBase)平台(http://www.kbase.us/)从微生物群落中提取高质量的宏基因组组装基因组(MAG)。总体而言,从较小的环境鸟枪法测序读段文库开始,这些步骤大约需要一天时间来获得提取的基因组,而从较大的文库开始则最多需要大约一周时间。在KBase中,这个过程是端到端的,允许用户从最初的测序读段一直到MAG,然后可以使用其他KBase功能进行分析,如系统发育定位、功能分配、代谢建模、泛基因组功能分析、RNA测序等。虽然这些功能的一部分可以从其他资源单独获得,但直观的可用性、数据互操作性以及在免费计算资源中工具的集成,使得KBase成为从微生物群落中获取MAG的强大平台。虽然这个工作流程为基因组提取过程中的每个关键步骤都提供了工具,但它也提供了一个框架,可以通过KBase软件开发工具包(SDK)轻松地用额外的MAG恢复和分析工具进行扩展。