The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Søltofts Plads, building 220, 2800 Kgs. Lyngby, Denmark.
Department of Bioengineering, University of California, 417 Powell-Focht Bioengineering Hall, San Diego, La Jolla, CA 92093-0412, USA.
Nucleic Acids Res. 2024 Jul 22;52(13):7487-7503. doi: 10.1093/nar/gkae523.
Filamentous Actinobacteria, recently renamed Actinomycetia, are the most prolific source of microbial bioactive natural products. Studies on biosynthetic gene clusters benefit from or require chromosome-level assemblies. Here, we provide DNA sequences from >1000 isolates: 881 complete genomes and 153 near-complete genomes, representing 28 genera and 389 species, including 244 likely novel species. All genomes are from filamentous isolates of the class Actinomycetia from the NBC culture collection. The largest genus is Streptomyces with 886 genomes including 742 complete assemblies. We use this data to show that analysis of complete genomes can bring biological understanding not previously derived from more fragmented sequences or less systematic datasets. We document the central and structured location of core genes and distal location of specialized metabolite biosynthetic gene clusters and duplicate core genes on the linear Streptomyces chromosome, and analyze the content and length of the terminal inverted repeats which are characteristic for Streptomyces. We then analyze the diversity of trans-AT polyketide synthase biosynthetic gene clusters, which encodes the machinery of a biotechnologically highly interesting compound class. These insights have both ecological and biotechnological implications in understanding the importance of high quality genomic resources and the complex role synteny plays in Actinomycetia biology.
丝状放线菌,最近更名为放线菌,是微生物生物活性天然产物最丰富的来源。生物合成基因簇的研究得益于或需要染色体水平的组装。在这里,我们提供了来自>1000 个分离物的 DNA 序列:881 个完整基因组和 153 个近完整基因组,代表 28 个属和 389 个种,包括 244 个可能的新种。所有的基因组都来自 NBC 培养物收集的放线菌门的丝状分离物。最大的属是链霉菌,有 886 个基因组,其中包括 742 个完整的组装。我们利用这些数据表明,完整基因组的分析可以带来以前从未从更零碎的序列或更系统的数据集推导出来的生物学理解。我们记录了核心基因的中心和结构化位置以及专门代谢物生物合成基因簇的远端位置,以及线性链霉菌染色体上核心基因的重复,并分析了末端反向重复的内容和长度,这是链霉菌的特征。然后,我们分析了转 AT 聚酮合酶生物合成基因簇的多样性,这些基因簇编码了一类具有生物技术高度兴趣的化合物的机制。这些见解在理解高质量基因组资源的重要性和同线性在放线菌生物学中的复杂作用方面具有生态和生物技术意义。