1] Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark. [2] Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark. [3].
1] INRA, Institut National de la Recherche Agronomique, UMR 14121 MICALIS, Jouy en Josas, France. [2] INRA, Institut National de la Recherche Agronomique, US 1367 Metagenopolis, Jouy en Josas, France. [3] Department of Computer Science, Center for Bioinformatics and Computational Biology, University of Maryland, USA. [4].
Nat Biotechnol. 2014 Aug;32(8):822-8. doi: 10.1038/nbt.2939. Epub 2014 Jul 6.
Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.
大多数当前用于分析宏基因组数据的方法都依赖于与参考基因组的比较,但许多环境中的微生物多样性远远超出了参考数据库所涵盖的范围。将复杂的宏基因组数据从头开始分离成特定的生物实体,例如特定的细菌株或病毒,仍然是一个尚未解决的大问题。在这里,我们提出了一种基于在一系列宏基因组样本中对共丰度基因进行分类的方法,该方法能够全面发现新的微生物生物、病毒和共遗传的遗传实体,并有助于在无需参考序列的情况下组装微生物基因组。我们在来自 396 个人类肠道微生物组样本的数据上验证了该方法,并鉴定了 7381 个共丰度基因组 (CAG),包括 741 个宏基因组物种 (MGS)。我们使用这些来组装 238 个高质量的微生物基因组,并确定 MGS 与数百种病毒或遗传实体之间的关联。我们的方法为全面分析复杂宏基因组样本中的多样性提供了手段。