Lynch Charlotte I, Adlard Dylan, Fowler Philip W
Nuffield Department of Medicine, University of Oxford, Oxford, UK.
These authors contributed equally.
ERJ Open Res. 2025 Jun 30;11(3). doi: 10.1183/23120541.00952-2024. eCollection 2025 May.
Rifampicin remains a key antibiotic in the treatment of tuberculosis. Despite advances in cataloguing resistance-associated variants (RAVs), novel and rare mutations in the relevant gene, , will be encountered in clinical samples, complicating the task of using genetics to predict whether a sample is resistant or not to rifampicin. We have trained a series of machine learning models with the aim of complementing genetics-based drug susceptibility testing.
We built a Test+Train dataset comprising 219 susceptible mutations and 46 RAVs. Features derived from the structure of the RNA polymerase or the change in chemistry introduced by the mutation were considered; however, only a few, notably the distance from the rifampicin binding site, were found to be predictive on their own. Owing to the paucity of RAVs we used Monte Carlo cross-validation with 50 repeats to train four different machine learning models.
All four models behaved similarly with sensitivities and specificities in the range 0.84-0.88 and 0.94-0.97, although we preferred the ensemble of decision tree models as they are easy to inspect and understand. We showed that measuring distances from molecular dynamics simulations did not improve performance.
It is possible to predict whether a mutation in confers resistance to rifampicin using a machine learning model trained on a combination of structural, chemical and evolutionary features; however, performance is moderate and training is complicated by the lack of data.
利福平仍然是治疗结核病的关键抗生素。尽管在对耐药相关变异(RAV)进行编目方面取得了进展,但临床样本中仍会遇到相关基因中的新突变和罕见突变,这使得利用遗传学预测样本是否对利福平耐药的任务变得复杂。我们训练了一系列机器学习模型,旨在补充基于遗传学的药物敏感性测试。
我们构建了一个测试+训练数据集,包含219个敏感突变和46个RAV。考虑了从RNA聚合酶结构或突变引入的化学变化中衍生的特征;然而,只有少数特征,特别是与利福平结合位点的距离,被发现自身具有预测性。由于RAV数量稀少,我们使用了50次重复的蒙特卡洛交叉验证来训练四种不同的机器学习模型。
所有四个模型的表现相似,敏感性和特异性范围分别为0.84 - 0.88和0.94 - 0.97,尽管我们更喜欢决策树模型的集成,因为它们易于检查和理解。我们表明,从分子动力学模拟中测量距离并不能提高性能。
使用基于结构、化学和进化特征组合训练的机器学习模型,可以预测某突变是否赋予对利福平的耐药性;然而,性能一般,且由于缺乏数据,训练过程较为复杂。