WHO Supranational TB Reference Laboratory, Tuberculosis and Mycobacteria Unit, Institut Pasteur de la Guadeloupe, Les Abymes, Guadeloupe, France; Transmission, Reservoir and Diversity of Pathogens Unit, Institut Pasteur de la Guadeloupe, Les Abymes, Guadeloupe, France.
Laboratoire de Mathématiques Informatique et Applications (LAMIA), Université des Antilles, Pointe-à-Pitre, Guadeloupe, France.
Infect Genet Evol. 2023 Sep;113:105466. doi: 10.1016/j.meegid.2023.105466. Epub 2023 Jun 16.
Data obtained from new sequencing technologies are evolving rapidly, leading to the development of specific bioinformatic tools, pipelines and softwares. Several algorithms and tools are today available allowing a better identification and description of Mycobacterium tuberculosis complex (MTBC) isolates worldwide. Our approach consists in applying existing methods to analyze DNA sequencing data (from FASTA or FASTQ files), and tentatively extract meaningful information that would facilitate identification as well as a better understanding and management of MTBC isolates (taking into account whole genome sequencing and classical genotyping data). The aim of this study is to propose a pipeline analysis allowing to potentially simplify MTBC data analysis by providing different ways to interpret genomic or genotyping information based on existing tools. Furthermore, we propose a "reconciledTB" list making a link with results directly obtained from whole genome sequencing (WGS) data and results obtained from classical genotyping analysis (data inferred from SpoTyping and MIRUReader). Data visualization graphics and trees generated provide additional elements to better understand and confer associations among information overlap analyses. Additionally, comparison between data entered in an international genotyping database (SITVITEXTEND) and ensuing data obtained from the pipeline not only provide meaningful information, but further suggest that simpiTB could also be suitable for new data integration in specific TB genotyping databases.
从新测序技术中获得的数据正在迅速发展,导致特定的生物信息学工具、管道和软件的发展。目前有几种算法和工具可用于更好地识别和描述全球结核分枝杆菌复合群(MTBC)分离株。我们的方法包括应用现有的方法来分析 DNA 测序数据(来自 FASTA 或 FASTQ 文件),并尝试提取有意义的信息,这将有助于识别以及更好地理解和管理 MTBC 分离株(考虑到全基因组测序和经典基因分型数据)。本研究的目的是提出一种分析管道,通过提供基于现有工具的不同方法来解释基因组或基因分型信息,从而有可能简化 MTBC 数据分析。此外,我们提出了一个“reconciledTB”列表,将其与直接从全基因组测序(WGS)数据获得的结果和从经典基因分型分析(从 SpoTyping 和 MIRUReader 推断的数据)获得的结果联系起来。生成的数据可视化图形和树提供了额外的元素,以更好地理解和解释信息重叠分析之间的关联。此外,在国际基因分型数据库(SITVITEXTEND)中输入的数据与从管道中获得的数据之间的比较不仅提供了有意义的信息,而且进一步表明 simpiTB 也可能适合特定的结核病基因分型数据库中新型数据的集成。