Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA.
Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
BMC Genomics. 2022 Jan 4;23(1):7. doi: 10.1186/s12864-021-08223-8.
With the exponential growth of publicly available genome sequences, pangenome analyses have provided increasingly complete pictures of genetic diversity for many microbial species. However, relatively few studies have scaled beyond single pangenomes to compare global genetic diversity both within and across different species. We present here several methods for "comparative pangenomics" that can be used to contextualize multi-pangenome scale genetic diversity with gene function for multiple species at multiple resolutions: pangenome shape, genes, sequence variants, and positions within variants.
Applied to 12,676 genomes across 12 microbial pathogenic species, we observed several shared resolution-specific patterns of genetic diversity: First, pangenome openness is associated with species' phylogenetic placement. Second, relationships between gene function and frequency are conserved across species, with core genomes enriched for metabolic and ribosomal genes and accessory genomes for trafficking, secretion, and defense-associated genes. Third, genes in core genomes with the highest sequence diversity are functionally diverse. Finally, certain protein domains are consistently mutation enriched across multiple species, especially among aminoacyl-tRNA synthetases where the extent of a domain's mutation enrichment is strongly function-dependent.
These results illustrate the value of each resolution at uncovering distinct aspects in the relationship between genetic and functional diversity across multiple species. With the continued growth of the number of sequenced genomes, these methods will reveal additional universal patterns of genetic diversity at the pangenome scale.
随着公开基因组序列的指数级增长,泛基因组分析为许多微生物物种的遗传多样性提供了越来越完整的图片。然而,相对较少的研究超越了单一泛基因组,以比较不同物种内部和之间的全球遗传多样性。我们在这里提出了几种“比较泛基因组学”的方法,可用于将多泛基因组规模的遗传多样性与多种物种的基因功能在多个分辨率下进行背景化:泛基因组形状、基因、序列变体以及变体中的位置。
应用于 12 个微生物病原物种的 12676 个基因组,我们观察到几种分辨率特定的遗传多样性共享模式:首先,泛基因组开放性与物种的系统发育位置有关。其次,基因功能和频率之间的关系在物种间是保守的,核心基因组富含代谢和核糖体基因,而辅助基因组则富含运输、分泌和防御相关基因。第三,核心基因组中具有最高序列多样性的基因在功能上是多样化的。最后,某些蛋白质结构域在多个物种中始终存在突变富集,尤其是在氨酰-tRNA 合成酶中,该结构域的突变富集程度强烈依赖于功能。
这些结果说明了在多个物种中,每个分辨率在揭示遗传和功能多样性之间关系的不同方面都具有价值。随着测序基因组数量的持续增长,这些方法将在泛基因组规模上揭示更多普遍的遗传多样性模式。