Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.
Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
Microb Genom. 2021 Sep;7(9). doi: 10.1099/mgen.0.000670.
The pan-genome is defined as the combined set of all genes in the gene pool of a species. Pan-genome analyses have been very useful in helping to understand different evolutionary dynamics of bacterial species: an open pan-genome often indicates a free-living lifestyle with metabolic versatility, while closed pan-genomes are linked to host-restricted, ecologically specialized bacteria. A detailed understanding of the species pan-genome has also been instrumental in tracking the phylodynamics of emerging drug resistance mechanisms and drug-resistant pathogens. However, current approaches to analyse a species' pan-genome do not take the species population structure into account, nor do they account for the uneven sampling of different lineages, as is commonplace due to over-sampling of clinically relevant representatives. Here we present the application of a population structure-aware approach for classifying genes in a pan-genome based on within-species distribution. We demonstrate our approach on a collection of 7500 genomes, one of the most-studied bacterial species and used as a model for an open pan-genome. We reveal clearly distinct groups of genes, clustered by different underlying evolutionary dynamics, and provide a more biologically informed and accurate description of the species' pan-genome.
泛基因组被定义为一个物种基因库中所有基因的总和。泛基因组分析在帮助理解细菌物种的不同进化动态方面非常有用:开放的泛基因组通常表明具有代谢多功能性的自由生活方式,而封闭的泛基因组与宿主限制、生态特化的细菌有关。对物种泛基因组的详细了解也有助于追踪新兴耐药机制和耐药病原体的系统发育动力学。然而,目前分析物种泛基因组的方法没有考虑到物种的种群结构,也没有考虑到不同谱系的不均匀采样,这在临床上由于对相关代表的过度采样而很常见。在这里,我们展示了一种基于种内分布对泛基因组中的基因进行分类的种群结构感知方法。我们在一个由 7500 个基因组组成的集合上演示了我们的方法,这是研究最多的细菌物种之一,被用作开放泛基因组的模型。我们揭示了明显不同的基因群,这些基因群由不同的潜在进化动态聚类,并提供了对物种泛基因组更具生物学意义和更准确的描述。