Mohite Omkar S, Jørgensen Tue S, Booth Thomas J, Charusanti Pep, Phaneuf Patrick V, Weber Tilmann, Palsson Bernhard O
The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, 2800, Denmark.
Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA.
Genome Biol. 2025 Jan 14;26(1):9. doi: 10.1186/s13059-024-03471-9.
Streptomyces is a highly diverse genus known for the production of secondary or specialized metabolites with a wide range of applications in the medical and agricultural industries. Several thousand complete or nearly complete Streptomyces genome sequences are now available, affording the opportunity to deeply investigate the biosynthetic potential within these organisms and to advance natural product discovery initiatives.
We perform pangenome analysis on 2371 Streptomyces genomes, including approximately 1200 complete assemblies. Employing a data-driven approach based on genome similarities, the Streptomyces genus was classified into 7 primary and 42 secondary Mash-clusters, forming the basis for comprehensive pangenome mining. A refined workflow for grouping biosynthetic gene clusters (BGCs) redefines their diversity across different Mash-clusters. This workflow also reassigns 2729 known BGC families to only 440 families, a reduction caused by inaccuracies in BGC boundary detections. When the genomic location of BGCs is included in the analysis, a conserved genomic structure, or synteny, among BGCs becomes apparent within species and Mash-clusters. This synteny suggests that vertical inheritance is a major factor in the diversification of BGCs.
Our analysis of a genomic dataset at a scale of thousands of genomes refines predictions of BGC diversity using Mash-clusters as a basis for pangenome analysis. The observed conservation in the order of BGCs' genomic locations shows that the BGCs are vertically inherited. The presented workflow and the in-depth analysis pave the way for large-scale pangenome investigations and enhance our understanding of the biosynthetic potential of the Streptomyces genus.
链霉菌属是一个高度多样化的属,以产生次生或特殊代谢产物而闻名,这些代谢产物在医药和农业产业中有广泛应用。目前已有数千个完整或近乎完整的链霉菌基因组序列,这为深入研究这些生物体的生物合成潜力以及推进天然产物发现计划提供了机会。
我们对2371个链霉菌基因组进行了泛基因组分析,其中包括约1200个完整组装体。采用基于基因组相似性的数据驱动方法,将链霉菌属分为7个主要的和42个次要的Mash簇,形成了全面泛基因组挖掘的基础。一种用于对生物合成基因簇(BGC)进行分组的优化工作流程重新定义了它们在不同Mash簇中的多样性。该工作流程还将2729个已知的BGC家族重新归类为仅440个家族,这种减少是由BGC边界检测不准确导致的。当在分析中纳入BGC的基因组位置时,BGC之间保守的基因组结构,即共线性,在物种和Mash簇中变得明显。这种共线性表明垂直遗传是BGC多样化的一个主要因素。
我们对数千个基因组规模的基因组数据集的分析,以Mash簇作为泛基因组分析的基础,完善了对BGC多样性的预测。观察到的BGC基因组位置顺序的保守性表明BGC是垂直遗传的。所提出的工作流程和深入分析为大规模泛基因组研究铺平了道路,并增强了我们对链霉菌属生物合成潜力的理解。