泛基因组矩阵的分解揭示了该物种基因分布的一种结构。

Decomposition of the pangenome matrix reveals a structure in gene distribution in the species.

作者信息

Chauhan Siddharth M, Ardalani Omid, Hyun Jason C, Monk Jonathan M, Phaneuf Patrick V, Palsson Bernhard O

机构信息

Department of Bioengineering, University of California, San Diego, La Jolla, California, USA.

Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Kongens, Lyngby, Denmark.

出版信息

mSphere. 2025 Jan 28;10(1):e0053224. doi: 10.1128/msphere.00532-24. Epub 2024 Dec 31.

DOI:10.1128/msphere.00532-24

PMID:39745367

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11774025/

Abstract

UNLABELLED

Thousands of complete genome sequences for strains of a species that are now available enable the advancement of pangenome analytics to a new level of sophistication. We collected 2,377 publicly available complete genomes of for detailed pangenome analysis. The core genome and accessory genomes consisted of 2,398 and 5,182 genes, respectively. We developed a machine learning approach to define the accessory genes characterizing the major phylogroups of plus : A, B1, B2, C, D, E, F, G, and . The analysis resulted in a detailed structure of the genetic basis of the phylogroups' differential traits. This pangenome structure was largely consistent with a housekeeping-gene-based MLST distribution, sequence-based Mash distance, and the Clermont quadruplex classification. The rare genome (consisting of genes found in <6.8% of all strains) consisted of 163,619 genes, about 79% of which represented variations of 315 underlying transposon elements. This analysis generated a mathematical definition of the genetic basis for a species.

IMPORTANCE

The comprehensive analysis of the pangenome of presented in this study marks a significant advancement in understanding bacterial genetic diversity. By employing machine learning techniques to analyze 2,377 complete genomes, the study provides a detailed mapping of core, accessory, and rare genes. This approach reveals the genetic basis for differential traits across phylogroups, offering insights into pathogenicity, antibiotic resistance, and evolutionary adaptations. The findings enhance the potential for genome-based diagnostics and pave the way for future studies aimed at achieving a global genetic definition of bacterial phylogeny.

摘要

未标记

现在已有一个物种菌株的数千个完整基因组序列，这使得泛基因组分析能够提升到一个新的复杂程度。我们收集了2377个公开可用的完整基因组用于详细的泛基因组分析。核心基因组和辅助基因组分别由2398个和5182个基因组成。我们开发了一种机器学习方法来定义表征该物种加上某些其他分类群（A、B1、B2、C、D、E、F、G以及某些其他分类群）主要系统发育群的辅助基因。分析得出了系统发育群差异特征遗传基础的详细结构。这种泛基因组结构在很大程度上与基于管家基因的多位点序列分型分布、基于序列的Mash距离以及克莱蒙特四重分类法一致。稀有基因组（由在所有菌株中<6.8%的菌株中发现的基因组成）由163619个基因组成，其中约79%代表315个潜在转座子元件的变异。该分析生成了一个物种遗传基础的数学定义。

重要性

本研究中对该物种泛基因组的全面分析标志着在理解细菌遗传多样性方面取得了重大进展。通过运用机器学习技术分析2377个完整的该物种基因组，该研究提供了核心、辅助和稀有基因的详细图谱。这种方法揭示了不同系统发育群差异特征的遗传基础，为致病性、抗生素抗性和进化适应性提供了见解。这些发现增强了基于基因组的诊断潜力，并为旨在实现细菌系统发育全球遗传定义的未来研究铺平了道路。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

泛基因组矩阵的分解揭示了该物种基因分布的一种结构。

Decomposition of the pangenome matrix reveals a structure in gene distribution in the species.

作者信息

机构信息

出版信息

UNLABELLED

IMPORTANCE

未标记

重要性

相似文献

引用本文的文献

本文引用的文献

泛基因组矩阵的分解揭示了该物种基因分布的一种结构。

Decomposition of the pangenome matrix reveals a structure in gene distribution in the species.

作者信息

机构信息

出版信息

UNLABELLED

IMPORTANCE

未标记

重要性

相似文献

引用本文的文献

本文引用的文献