Stritt Christoph, Reitsma Michelle, Marin Ana Maria Garcia, Goig Galo, Dötsch Anna, Borrell Sonia, Beisel Christian, Comas Iñaki, Brites Daniela, Gagneux Sebastien
Swiss Tropical and Public Health Institute, Allschwil, Switzerland.
University of Basel, Basel, Switzerland.
Microb Genom. 2025 May;11(5). doi: 10.1099/mgen.0.001396.
Repeats are the most diverse and dynamic but also the least well-understood component of microbial genomes. For all we know, repeat-associated mutations such as duplications, deletions, inversions and gene conversion might be as common as point mutations, but because of short-read myopia and methodological bias, they have received much less attention. Long-read DNA sequencing opens the perspective of resolving repeats and systematically investigating the mutations they induce. For this study, we assembled the genomes of 16 closely related strains of the bacterial pathogen from Pacific Biosciences HiFi reads, with the aim of characterizing the full spectrum of DNA polymorphisms. We found that complete and accurate genomes can be assembled from HiFi reads, with read size being the main limitation in the presence of duplications. By combining a reference-free pangenome graph with extensive repeat annotation, we identified 110 variants, 58 of which could be assigned to repeat-associated mutational mechanisms such as strand slippage and homologous recombination. Whilst recombination events were less frequent than point mutations, they affected large regions and introduced multiple variants at once, as shown by three gene conversion events and a duplication of 7.3 kb that involved and , two genes possibly involved in immune subversion. The vast majority of variants were present in single isolates, such that phylogenetic resolution was only marginally increased when estimating a tree from complete genomes. Our study shows that the contribution of repeat-associated mechanisms of mutation can be similar to that of point mutations at the microevolutionary scale of an outbreak. A large reservoir of unstudied genetic variation in this 'monomorphic' bacterial pathogen awaits investigation.
重复序列是微生物基因组中最多样化、最具动态性但也是了解最少的组成部分。据我们所知,重复序列相关的突变,如重复、缺失、倒位和基因转换,可能与点突变一样常见,但由于短读长测序的局限性和方法上的偏差,它们受到的关注要少得多。长读长DNA测序为解析重复序列和系统研究它们引发的突变开辟了前景。在本研究中,我们利用太平洋生物科学公司的高保真(HiFi) reads组装了16株密切相关的细菌病原体菌株的基因组,目的是表征DNA多态性的全貌。我们发现,可以从HiFi reads组装出完整准确的基因组,读长大小是存在重复序列时的主要限制因素。通过将无参考泛基因组图谱与广泛的重复序列注释相结合,我们鉴定出110个变异,其中58个可归因于与重复序列相关的突变机制,如链滑动和同源重组。虽然重组事件比点突变少见,但它们影响大片段区域并一次引入多个变异,如三个基因转换事件和一个7.3 kb的重复所示,该重复涉及 和 ,这两个基因可能参与免疫颠覆。绝大多数变异存在于单个分离株中,因此从完整基因组估计系统发育树时,系统发育分辨率仅略有提高。我们的研究表明,在疫情的微观进化尺度上,与重复序列相关的突变机制的贡献可能与点突变相似。在这种“单态”细菌病原体中,大量未研究的遗传变异库有待研究。