European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.
Microbial Genomics Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
Nat Protoc. 2021 May;16(5):2520-2541. doi: 10.1038/s41596-021-00508-2. Epub 2021 Apr 16.
Recovering genomes from shotgun metagenomic sequence data allows detailed taxonomic and functional characterization of individual species or strains in a microbial community. Retrieving these metagenome-assembled genomes (MAGs) involves seven stages. First, low-quality bases, along with adapter and host sequences, are removed. Second, overlapping sequences are assembled to create longer contiguous fragments. Third, these fragments are clustered based on sequence composition and abundance. Fourth, these sequence clusters, or bins, undergo rounds of quality assessment and refinement to yield MAGs. The optional fifth stage is dereplication of MAGs to select representatives. Next, each MAG is taxonomically classified. The optional seventh stage is assessing the fraction of diversity that has been recovered. The output of this protocol is draft genomes, which can provide invaluable clues about uncultured organisms. This protocol takes ~1 week to run, depending on computational resources available, and requires prior experience with high-performance computing, shell script programming and Python.
从鸟枪法宏基因组测序数据中恢复基因组,可以详细描述微生物群落中个体物种或菌株的分类学和功能特征。获取这些宏基因组组装基因组(MAG)涉及七个阶段。首先,去除低质量碱基,以及接头和宿主序列。其次,组装重叠序列以创建更长的连续片段。第三,根据序列组成和丰度对这些片段进行聚类。第四,对这些序列聚类或桶进行质量评估和细化的循环,以生成 MAG。可选的第五阶段是对 MAG 进行去重以选择代表。接下来,对每个 MAG 进行分类学分类。可选的第七阶段是评估已恢复多样性的分数。该协议的输出是草图基因组,它可以为未培养的生物体提供宝贵的线索。该协议的运行时间约为 1 周,具体取决于可用的计算资源,并且需要具备高性能计算、外壳脚本编程和 Python 的先验经验。