Rajwani Rahim, Ohlemacher Shannon I, Zhao Gengxiang, Liu Hong-Bing, Bewley Carole A
Laboratory of Bioorganic Chemistry, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Healthgrid.94365.3d, Bethesda, Maryland, USA.
mSystems. 2021 Dec 21;6(6):e0102021. doi: 10.1128/mSystems.01020-21. Epub 2021 Nov 23.
Genome mining is an important tool for discovery of new natural products; however, the number of publicly available genomes for natural product-rich microbes such as actinomycetes, relative to human pathogens with smaller genomes, is small. To obtain contiguous DNA assemblies and identify large (ca. 10 to greater than 100 kb) biosynthetic gene clusters (BGCs) with high GC (>70%) and high-repeat content, it is necessary to use long-read sequencing methods when sequencing actinomycete genomes. One of the hurdles to long-read sequencing is the higher cost. In the current study, we assessed Flongle, a recently launched platform by Oxford Nanopore Technologies, as a low-cost DNA sequencing option to obtain contiguous DNA assemblies and analyze BGCs. To make the workflow more cost-effective, we multiplexed up to four samples in a single Flongle sequencing experiment while expecting low-sequencing coverage per sample. We hypothesized that contiguous DNA assemblies might enable analysis of BGCs even at low sequencing depth. To assess the value of these assemblies, we collected high-resolution mass spectrometry data and conducted a multi-omics analysis to connect BGCs to secondary metabolites. In total, we assembled genomes for 20 distinct strains across seven sequencing experiments. In each experiment, 50% of the bases were in reads longer than 10 kb, which facilitated the assembly of reads into contigs with an average value of 3.5 Mb. The programs antiSMASH and PRISM predicted 629 and 295 BGCs, respectively. We connected BGCs to metabolites for ,-dimethyl cyclic-di-tryptophan, two novel lasso peptides, and three known actinomycete-associated siderophores, namely, mirubactin, heterobactin, and salinichelin. Short-read sequencing of GC-rich genomes such as those from actinomycetes results in a fragmented genome assembly and truncated biosynthetic gene clusters (often 10 to >100 kb long), which hinders our ability to understand the biosynthetic potential of a given strain and predict the molecules that can be produced. The current study demonstrates that contiguous DNA assemblies, suitable for analysis of BGCs, can be obtained through low-coverage, multiplexed sequencing on Flongle, which provides a new low-cost workflow ($30 to 40 per strain) for sequencing actinomycete strain libraries.
基因组挖掘是发现新天然产物的重要工具;然而,相对于基因组较小的人类病原体而言,放线菌等富含天然产物的微生物的公开可用基因组数量较少。为了获得连续的DNA组装体,并鉴定具有高GC含量(>70%)和高重复含量的大型(约10至大于100 kb)生物合成基因簇(BGC),在对放线菌基因组进行测序时,有必要使用长读长测序方法。长读长测序的障碍之一是成本较高。在本研究中,我们评估了牛津纳米孔技术公司最近推出的Flongle平台,作为一种低成本的DNA测序选项,以获得连续的DNA组装体并分析BGC。为了使工作流程更具成本效益,我们在单个Flongle测序实验中对多达四个样本进行了多重测序,同时预期每个样本的测序覆盖度较低。我们假设,即使在低测序深度下,连续的DNA组装体也可能有助于对BGC进行分析。为了评估这些组装体的价值,我们收集了高分辨率质谱数据,并进行了多组学分析,以将BGC与次生代谢产物联系起来。在总共七次测序实验中,我们为20个不同的菌株组装了基因组。在每次实验中,50%的碱基位于长度超过10 kb的读段中,这有助于将读段组装成平均长度为3.5 Mb的重叠群。antiSMASH和PRISM程序分别预测了629个和295个BGC。我们将BGC与代谢产物联系起来,这些代谢产物包括α,α-二甲基环二色氨酸、两种新型套索肽以及三种已知的与放线菌相关的铁载体,即微红素、异铁载体和盐霉素。对富含GC的基因组(如放线菌的基因组)进行短读长测序会导致基因组组装碎片化和生物合成基因簇截断(通常为10至大于100 kb长),这阻碍了我们了解给定菌株的生物合成潜力并预测其可能产生的分子的能力。本研究表明,通过在Flongle上进行低覆盖度的多重测序,可以获得适用于BGC分析的连续DNA组装体,这为放线菌菌株文库测序提供了一种新的低成本工作流程(每个菌株30至40美元)。