WHO Supranational TB Reference Laboratory, Tuberculosis and Mycobacteria Unit, Institut Pasteur de la Guadeloupe, F-97183, Abymes, Guadeloupe, France.
Laboratoire de Mathématiques Informatique et Applications (LAMIA), Université des Antilles, F-97154, Pointe-à-Pitre, Guadeloupe, France.
Database (Oxford). 2020 Dec 15;2020. doi: 10.1093/database/baaa108.
Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units-variable number of tandem DNA repeats and spoligotyping-based families. In the recently released SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe, a large number of spoligotype families were assigned by either manual curation/expertise or using an in-house algorithm. In this study, we present two complementary data-driven approaches allowing fast and precise family prediction from spoligotyping patterns. The first one is based on data transformation and the use of decision tree classifiers. In contrast, the second one searches for a set of simple rules using binary masks through a specifically designed evolutionary algorithm. The comparison with the three main approaches in the field highlighted the good performances of our contributions and the significant runtime gain. Finally, we propose the 'SpolLineages' software tool (https://github.com/dcouvin/SpolLineages), which implements these approaches for MTBC spoligotype families' identification.
生物信息学工具目前正在开发中,以更好地了解结核分枝杆菌复合群(MTBC)。已经存在几种使用经典基因分型方法(例如分枝杆菌插入重复单位-可变数目的串联 DNA 重复和 spoligotyping 家族)来鉴定 MTBC 谱系的方法。在巴斯德研究所瓜德罗普岛分部最近发布的 SITVIT2 专有的数据库中,大量 spoligotype 家族是通过手动策展/专业知识或使用内部算法来分配的。在这项研究中,我们提出了两种互补的数据驱动方法,可从 spoligotyping 模式中快速而精确地预测家族。第一种方法基于数据转换和决策树分类器的使用。相比之下,第二种方法通过专门设计的进化算法使用二进制掩码搜索一组简单的规则。与该领域的三种主要方法进行比较突出了我们的贡献的良好性能和显著的运行时增益。最后,我们提出了“SpolLineages”软件工具(https://github.com/dcouvin/SpolLineages),该工具实现了这些方法来鉴定 MTBC spoligotype 家族。