Suppr超能文献

基于平均核苷酸同源性的菌株分组允许在泛基因组中识别菌株特异性基因。

Average nucleotide identity-based strain grouping allows identification of strain-specific genes in the pangenome.

机构信息

Microbiology and Molecular Genetics Program, Graduate Division of Biological and Biomedical Sciences, Laney Graduate School, Emory University, Atlanta, Georgia, USA.

Division of Infectious Diseases, Department of Medicine, Emory University, Atlanta, Georgia, USA.

出版信息

mSystems. 2024 Jul 23;9(7):e0014324. doi: 10.1128/msystems.00143-24. Epub 2024 Jun 27.

Abstract

UNLABELLED

causes both hospital- and community-acquired infections in humans worldwide. Due to the high incidence of infection, is also one of the most sampled and sequenced pathogens today, providing an outstanding resource to understand variation at the bacterial subspecies level. We processed and downsampled 83,383 public Illumina whole-genome shotgun sequences and 1,263 complete genomes to produce 7,954 representative substrains. Pairwise comparison of average nucleotide identity revealed a natural boundary of 99.5% that could be used to define 145 distinct strains within the species. We found that intermediate frequency genes in the pangenome (present in 10%-95% of genomes) could be divided into those closely linked to strain background ("strain-concentrated") and those highly variable within strains ("strain-diffuse"). Non-core genes had different patterns of chromosome location. Notably, strain-diffuse genes were associated with prophages; strain-concentrated genes were associated with the vSaβ genome island and rare genes (<10% frequency) concentrated near the origin of replication. Antibiotic resistance genes were enriched in the strain-diffuse class, while virulence genes were distributed between strain-diffuse, strain-concentrated, core, and rare classes. This study shows how different patterns of gene movement help create strains as distinct subspecies entities and provide insight into the diverse histories of important functions.

IMPORTANCE

We analyzed the genomic diversity of , a globally prevalent bacterial species that causes serious infections in humans. Our goal was to build a genetic picture of the different strains of and which genes may be associated with them. We reprocessed >84,000 genomes and subsampled to remove redundancy. We found that individual samples sharing >99.5% of their genome could be grouped into strains. We also showed that a portion of genes that are present in intermediate frequency in the species are strongly associated with some strains but completely absent from others, suggesting a role in strain specificity. This work lays the foundation for understanding individual gene histories of the species and also outlines strategies for processing large bacterial genomic data sets.

摘要

未加标签

在全球范围内导致医院和社区获得性感染。由于感染发生率高,也是当今采样和测序最多的病原体之一,为了解细菌亚种水平的变异提供了极好的资源。我们处理和下采样了 83383 个公共 Illumina 全基因组鸟枪法序列和 1263 个完整基因组,生成了 7954 个代表性亚系。平均核苷酸同一性的成对比较显示出 99.5%的自然边界,可用于定义该物种内的 145 个不同菌株。我们发现,泛基因组中的中频基因(存在于 10%-95%的基因组中)可分为与菌株背景密切相关的基因(“菌株集中”)和菌株内高度变异的基因(“菌株扩散”)。非核心基因的染色体位置具有不同的模式。值得注意的是,菌株扩散基因与噬菌体有关;菌株集中基因与 vSaβ 基因组岛和复制起点附近的稀有基因(频率<10%)有关。抗生素抗性基因在菌株扩散类中富集,而毒力基因则分布在菌株扩散、菌株集中、核心和稀有类中。这项研究表明,不同的基因移动模式如何帮助创建作为不同亚种实体的菌株,并深入了解重要 功能的不同历史。

重要性

我们分析了一种在全球范围内普遍存在的细菌物种 的基因组多样性,该细菌会导致人类严重感染。我们的目标是构建 不同菌株的遗传图谱,并确定哪些基因可能与其相关。我们重新处理了超过 84000 个基因组并进行了亚采样以去除冗余。我们发现,共享>99.5%基因组的单个样本可以分为菌株。我们还表明,物种中存在中间频率的一部分基因与某些菌株密切相关,但完全不存在于其他菌株中,这表明它们在菌株特异性中具有作用。这项工作为了解 物种的单个基因历史奠定了基础,也为处理大型细菌基因组数据集制定了策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ceb/11265343/9a4529de3977/msystems.00143-24.f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验