Suppr超能文献

一种简单、高效、灵活和可扩展的工作流程,用于从宏基因组中重建原核基因组。

: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes.

机构信息

Institut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, France.

Université Paris-Saclay, INRAE, AgroParisTech, UMR SayFood, Palaiseau, 91120, France.

出版信息

F1000Res. 2022 Dec 15;11:1522. doi: 10.12688/f1000research.128091.2. eCollection 2022.

Abstract

Over the last decade, we have observed in microbial ecology a transition from gene-centric to genome-centric analyses. Indeed, the advent of metagenomics combined with binning methods, single-cell genome sequencing as well as high-throughput cultivation methods have contributed to the continuing and exponential increase of available prokaryotic genomes, which in turn has favored the exploration of microbial metabolisms. In the case of metagenomics, data processing, from raw reads to genome reconstruction, involves various steps and software which can represent a major technical obstacle. To overcome this challenge, we developed , a simple workflow that can process Illumina data, from raw reads to metagenome-assembled genomes (MAGs) classification and relative abundance estimate. It integrates state-of-the-art bioinformatic tools to sequentially perform: quality control of the reads (illumina-utils, Trimmomatic), host sequence removal (optional step, using Bowtie2), assembly (MEGAHIT), binning (MetaBAT2), quality filtering of the bins (CheckM, GUNC), classification of the MAGs (GTDB-Tk) and estimate of their relative abundance (CoverM). Developed with the popular Snakemake workflow management system, it can be deployed on various architectures, from single to multicore and from workstation to computer clusters and grids. It is also flexible since users can easily change parameters and/or add new rules. Using termite gut metagenomic datasets, we showed that is slower but allowed the recovery of more MAGs encompassing more diverse phyla compared to another similar workflow named ATLAS. Importantly, these additional MAGs showed no significant difference compared to the other ones in terms of completeness, contamination, genome size nor relative abundance. Overall, it should make the reconstruction of MAGs more accessible to microbiologists. as well as test files and an extended tutorial are available at https://github.com/Nachida08/SnakeMAGs.

摘要

在过去的十年中,我们在微生物生态学中观察到了从基因中心到基因组中心分析的转变。事实上,宏基因组学的出现,加上分箱方法、单细胞基因组测序以及高通量培养方法,促成了可利用的原核基因组的持续和指数增长,这反过来又促进了微生物代谢的探索。在宏基因组学的情况下,从原始读数到基因组重建的数据处理涉及到各种步骤和软件,这可能是一个主要的技术障碍。为了克服这个挑战,我们开发了 SnakeMAGs,这是一个简单的工作流程,可以处理 Illumina 数据,从原始读数到宏基因组组装基因组(MAGs)分类和相对丰度估计。它集成了最先进的生物信息学工具,以顺序执行:读取质量控制(illumina-utils、Trimmomatic)、宿主序列去除(可选步骤,使用 Bowtie2)、组装(MEGAHIT)、分箱(MetaBAT2)、箱质量过滤(CheckM、GUNC)、MAGs 分类(GTDB-Tk)和相对丰度估计(CoverM)。它是使用流行的 Snakemake 工作流管理系统开发的,可以部署在各种架构上,从单核心到多核,从工作站到计算机集群和网格。它也很灵活,因为用户可以轻松地更改参数和/或添加新规则。使用白蚁肠道宏基因组数据集,我们表明,与另一个名为 ATLAS 的类似工作流程相比,SnakeMAGs 速度较慢,但可以恢复更多的 MAGs,涵盖更多不同的门。重要的是,这些额外的 MAGs 在完整性、污染、基因组大小或相对丰度方面与其他 MAGs 没有显著差异。总的来说,它应该使微生物学家更容易重建 MAGs。SnakeMAGs 的测试文件和扩展教程可在 https://github.com/Nachida08/SnakeMAGs 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96f4/9978245/d2e8698ff019/f1000research-11-144793-g0000.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验