Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden.
Mol Ecol Resour. 2023 Oct;23(7):1724-1736. doi: 10.1111/1755-0998.13826. Epub 2023 Jun 29.
At the genome level, microorganisms are highly adaptable both in terms of allele and gene composition. Such heritable traits emerge in response to different environmental niches and can have a profound influence on microbial community dynamics. As a consequence, any individual genome or population will contain merely a fraction of the total genetic diversity of any operationally defined "species", whose ecological potential can thus be only fully understood by studying all of their genomes and the genes therein. This concept, known as the pangenome, is valuable for studying microbial ecology and evolution, as it partitions genomes into core (present in all the genomes from a species, and responsible for housekeeping and species-level niche adaptation among others) and accessory regions (present only in some, and responsible for intra-species differentiation). Here we present SuperPang, an algorithm producing pangenome assemblies from a set of input genomes of varying quality, including metagenome-assembled genomes (MAGs). SuperPang runs in linear time and its results are complete, non-redundant, preserve gene ordering and contain both coding and non-coding regions. Our approach provides a modular view of the pangenome, identifying operons and genomic islands, and allowing to track their prevalence in different populations. We illustrate this by analysing intra-species diversity in Polynucleobacter, a bacterial genus ubiquitous in freshwater ecosystems, characterized by their streamlined genomes and their ecological versatility. We show how SuperPang facilitates the simultaneous analysis of allelic and gene content variation under different environmental pressures, allowing us to study the drivers of microbial diversification at unprecedented resolution.
在基因组水平上,微生物在等位基因和基因组成方面具有高度的适应性。这些可遗传的特征是对不同环境小生境的反应而出现的,并对微生物群落动态有深远的影响。因此,任何单个基因组或种群只包含任何操作定义的“物种”的总遗传多样性的一小部分,其生态潜力只有通过研究所有的基因组及其基因才能充分理解。这个概念被称为泛基因组,对于研究微生物生态学和进化具有重要意义,因为它将基因组划分为核心(存在于一个物种的所有基因组中,负责维持生命和物种水平的小生境适应等)和辅助区域(仅存在于某些基因组中,负责种内分化)。在这里,我们介绍了 SuperPang,这是一种从一组不同质量的输入基因组(包括宏基因组组装基因组(MAG))中生成泛基因组组装的算法。SuperPang 运行时间呈线性,其结果是完整的、非冗余的,保留了基因排序,并包含编码和非编码区域。我们的方法提供了泛基因组的模块化视图,识别操纵子和基因组岛,并允许跟踪它们在不同种群中的流行情况。我们通过分析淡水生态系统中普遍存在的 Polynucleobacter 属的种内多样性来说明这一点,该属的基因组流线型,生态多功能性。我们展示了 SuperPang 如何促进在不同环境压力下对等位基因和基因含量变化的同时分析,使我们能够以前所未有的分辨率研究微生物多样化的驱动因素。