Federal Research and Clinical Center of Physical-Chemical Medicine, Moscow, Russia.
St. Petersburg Pasteur Institute, St. Petersburg, Russia.
mSphere. 2021 Aug 25;6(4):e0053521. doi: 10.1128/mSphere.00535-21. Epub 2021 Jul 21.
Mycobacterium tuberculosis complex (MTBC) species are classic examples of genetically monomorphic microorganisms due to their low genetic variability. Whole-genome sequencing made it possible to describe both the main species within the complex and M. tuberculosis lineages and sublineages. This differentiation is based on single nucleotide polymorphisms (SNPs) and large sequence polymorphisms in the so-called regions of difference (RDs). Although a number of studies have been performed to elucidate RD localizations, their distribution among MTBC species, and their role in the bacterial life cycle, there are some inconsistencies and ambiguities in the localization of RDs in different members of the complex. To address this issue, we conducted a thorough search for all possible deletions in the WGS data collection comprising 721 samples representing the full MTBC diversity. Discovered deletions were compared with a list of all previously described RDs. As with the SNP-based analysis, we confirmed the specificities of 79 regions at the species, lineage, or sublineage level, 17 of which are described for the first time. We also present RDscan (https://github.com/dbespiatykh/RDscan), an open-source workflow, which detects deletions from short-read sequencing data and correlates the results with high-specificity RDs, curated in this study. Testing of the workflow on a collection comprising ∼7,000 samples showed a high specificity of the found RDs. This study provides novel details that can contribute to a better understanding of the species differentiation within the MTBC and can help to determine how individual clusters evolve within various MTBC species. Reductive genome evolution is one of the most important and intriguing adaptation strategies of different living organisms to their environment. Mycobacterium offers several notorious examples of either naturally reduced (Mycobacterium leprae) or laboratory-reduced (Mycobacterium bovis BCG) genomes. Mycobacterium tuberculosis complex has its phylogeny unambiguously framed by large sequence polymorphisms that present unidirectional unique event changes. In the present study, we curated all known regions of difference and analyzed both Mycobacterium tuberculosis and animal-adapted MTBC species. For 79 loci, we have shown a relationship with phylogenetic units, which can serve as a marker for diagnosing or studying biological effects. Moreover, intersections were found for some loci, which may indicate the nonrandomness of these processes and the involvement of these regions in the adaptation of bacteria to external conditions.
结核分枝杆菌复合群(MTBC)物种是由于其遗传变异性低而成为典型的遗传单态微生物。全基因组测序使得描述该复合群中的主要物种以及结核分枝杆菌谱系和亚谱系成为可能。这种分化是基于单核苷酸多态性(SNP)和所谓的差异区(RD)中的大片段序列多态性。尽管已经进行了许多研究来阐明 RD 的定位及其在 MTBC 物种中的分布及其在细菌生命周期中的作用,但在该复合群的不同成员中,RD 的定位存在一些不一致和模糊之处。为了解决这个问题,我们对包含 721 个代表 MTBC 多样性的全基因组测序样本的 WGS 数据集中的所有可能缺失进行了彻底搜索。发现的缺失与之前描述的所有 RD 列表进行了比较。与基于 SNP 的分析一样,我们在物种、谱系或亚谱系水平上确认了 79 个区域的特异性,其中 17 个是首次描述的。我们还介绍了 RDscan(https://github.com/dbespiatykh/RDscan),这是一个开源工作流程,可以从短读测序数据中检测缺失,并将结果与本研究中 curated 的高特异性 RD 相关联。在包含约 7000 个样本的集合上对工作流程进行测试表明,发现的 RD 具有很高的特异性。这项研究提供了新的细节,可以帮助更好地理解 MTBC 内的物种分化,并有助于确定各个簇在各种 MTBC 物种中的进化方式。 基因组的简化进化是不同生物体适应环境的最重要和最有趣的策略之一。分枝杆菌提供了几个著名的自然简化(麻风分枝杆菌)或实验室简化(牛分枝杆菌卡介苗)的例子。结核分枝杆菌复合群的系统发育通过呈现单向独特事件变化的大序列多态性明确框定。在本研究中,我们整理了所有已知的差异区,并分析了结核分枝杆菌和动物适应的 MTBC 物种。对于 79 个基因座,我们已经显示出与系统发育单位的关系,这可以作为诊断或研究生物学效应的标记。此外,还发现了一些基因座的交点,这可能表明这些过程不是随机的,并且这些区域参与了细菌对外部条件的适应。