FEMTO-ST Institute, UMR 6174, CNRS-Université Bourgogne Franche-Comté (UBFC), France.
Université Paris-Saclay, 91190, Gif-sur-Yvette, France; Université Paris-Cité, IAME, UMR 1137, INSERM, Paris, France.
Tuberculosis (Edinb). 2023 Dec;143S:102376. doi: 10.1016/j.tube.2023.102376. Epub 2023 Nov 25.
Mycobacterium tuberculosis complex (MTBC) has a population structure consisting of 9 human and animal lineages. The genomic diversity within these lineages is a pathogenesis factor that affects virulence, transmissibility, host response, and antibiotic resistance. Hence it is important to develop improved information systems for tracking and understanding the spreading and evolution of genomes. We present results obtained thanks to a new informatics platform for computational biology of MTBC, that uses a convenience sample from public/private SRAs, designated as TB-Annotator. Version 1 was a first interactive graphic-based web tool based on 15,901 representative genomes. Version 2, still interactive, is a more sophisticated database, developed using the Snakemake Workflow Management System (WMS) that allows an unsupervised global and scalable analysis of the content of the USA National Center for Biotechnology Information Short Read Archives database. This platform analyzes nucleotide variants, the presence/absence of genes, known regions of difference and detect new deletions, the insertion sites of mobile genetic elements, and allows phylogenetic trees to be built, imported in a graphical interface and interactively analyzed between the data and the tree. The objective of TB-Annotator is triple: detect recent epidemiological links, reconstruct distant phylogeographical histories as well as perform more complex phenotypic/genotypic Genome-Wide Association Studies (GWAS). In this paper, we compare the various taxonomic SNPs-based labels and hierarchies previously described in recent reference papers for L1, and present a comparative analysis that allows identification of alias and thus provides the basis of a future unifying naming scheme for L1 sublineages. We present a global phylogenetic tree built with RAxML-NG, and one on L2; at the time of writing, we characterized about 200 sublineages, with many new ones; a detail tree for Modern L2 and a hierarchical scheme allowing to facilitate L2 lineage assignment are also presented.
结核分枝杆菌复合群(MTBC)具有由 9 个人类和动物谱系组成的种群结构。这些谱系内的基因组多样性是影响毒力、传染性、宿主反应和抗生素耐药性的发病因素。因此,开发用于跟踪和了解基因组传播和进化的改进信息系统非常重要。我们展示了得益于用于 MTBC 计算生物学的新信息学平台获得的结果,该平台使用来自公共/私人 SRAs 的便利样本,称为 TB-Annotator。版本 1 是第一个基于 15901 个代表性基因组的交互式图形网络工具。版本 2 仍然是交互式的,是一个更复杂的数据库,使用 Snakemake 工作流管理系统(WMS)开发,可以对美国国家生物技术信息中心短读序列档案数据库的内容进行全局和可扩展的非监督分析。该平台分析核苷酸变体、基因的存在/缺失、已知的差异区域并检测新的缺失、移动遗传元件的插入位点,并允许构建系统发育树、将其导入图形界面,并在数据和树之间进行交互式分析。TB-Annotator 的目标是三重的:检测最近的流行病学联系、重建遥远的系统地理学历史以及执行更复杂的表型/基因型全基因组关联研究(GWAS)。在本文中,我们比较了最近参考论文中描述的 L1 基于各种分类 SNP 的标签和层次结构,并进行了比较分析,从而确定了别名,并为 L1 亚谱系提供了未来统一命名方案的基础。我们展示了使用 RAxML-NG 构建的全局系统发育树和 L2 的系统发育树;在撰写本文时,我们已经对大约 200 个亚谱系进行了特征描述,其中许多是新的;还展示了现代 L2 的详细树和一个层次结构方案,以方便 L2 谱系分配。