FEMTO-ST Institute, UMR 6174 CNRS, DISC Computer Department, Univ. Bourgogne Franche-Comté (UBFC), Besançon, France.
IAME, UMR1137 INSERM, Université Paris, Université Paris Nord.
PLoS Comput Biol. 2021 Mar 5;17(3):e1008500. doi: 10.1371/journal.pcbi.1008500. eCollection 2021 Mar.
Mycobacterium tuberculosis complex (MTC) CRISPR locus diversity has long been studied solely investigating the presence/absence of a known set of spacers. Unveiling the genetic mechanisms of its evolution requires a more exhaustive reconstruction in a large amount of representative strains. In this article, we point out and resolve, with a new pipeline, the problem of CRISPR reconstruction based directly on short read sequences in M. tuberculosis. We first show that the process we set up, that we coin as "CRISPRbuilder-TB" (https://github.com/cguyeux/CRISPRbuilder-TB), allows an efficient reconstruction of simulated or real CRISPRs, even when including complex evolutionary steps like the insertions of mobile elements. Compared to more generalist tools, the whole process is much more precise and robust, and requires only minimal manual investigation. Second, we show that more than 1/3 of the currently complete genomes available for this complex in the public databases contain largely erroneous CRISPR loci. Third, we highlight how both the classical experimental in vitro approach and the basic in silico spoligotyping provided by existing analytic tools miss a whole diversity of this locus in MTC, by not capturing duplications, spacer and direct repeats variants, and IS6110 insertion locations. This description is extended in a second article that describes MTC-CRISPR diversity and suggests general rules for its evolution. This work opens perspectives for an in-depth exploration of M. tuberculosis CRISPR loci diversity and of mechanisms involved in its evolution and its functionality, as well as its adaptation to other CRISPR locus-harboring bacterial species.
结核分枝杆菌复合群(MTC)的 CRISPR 基因座多样性长期以来一直仅通过研究已知间隔区的存在/缺失情况来研究。揭示其进化的遗传机制需要在大量具有代表性的菌株中进行更详尽的重建。在本文中,我们提出并解决了一个新的问题,即在结核分枝杆菌中直接从短读序列重建 CRISPR 的问题。我们首先表明,我们建立的流程(我们称之为“CRISPRbuilder-TB”(https://github.com/cguyeux/CRISPRbuilder-TB))可以有效地重建模拟或真实的 CRISPR,即使包括复杂的进化步骤,如移动元件的插入。与更具通用性的工具相比,整个过程更加精确和稳健,并且只需要最小限度的手动调查。其次,我们表明,在公共数据库中目前可用于该复合体的完整基因组中,超过 1/3 的基因组包含大量错误的 CRISPR 基因座。第三,我们强调了经典的体外实验方法和现有的分析工具提供的基本 spoligotyping 如何未能捕获 MTC 中该基因座的整个多样性,因为它们无法捕获重复、间隔区和直接重复变体以及 IS6110 插入位置。这一描述在第二篇文章中进一步扩展,描述了 MTC-CRISPR 的多样性,并提出了其进化和功能的一般规则,以及其对其他含有 CRISPR 基因座的细菌物种的适应。这项工作为深入探索结核分枝杆菌 CRISPR 基因座多样性及其进化和功能机制,以及其对其他含有 CRISPR 基因座的细菌物种的适应提供了前景。