Chen Yiwang, Zhang Xuecong, Liang Jialei, Jiang Qi, Peierdun Mijiti, Xu Peng, Takiff Howard E, Gao Qian
National Clinical Research Center for Infectious Diseases, Shenzhen Clinical Research Center for Tuberculosis, Shenzhen Third People's Hospital, Shenzhen, Guangdong, China.
Key Laboratory of Medical Molecular Virology (MOE/NHC/CAMS), School of Basic Medical Sciences, Shanghai Medical College, Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China.
Genome Med. 2025 Mar 26;17(1):31. doi: 10.1186/s13073-025-01458-0.
The WHO recently released a second edition of the mutation catalog for predicting drug resistance in Mycobacterium tuberculosis (MTB). This study evaluated its effectiveness compared to existing whole-genome sequencing (WGS)-based prediction methods and proposes a novel approach for its optimization.
We tested the accuracy of five tools-the WHO catalog, TB Profiler, SAM-TB, GenTB, and MD-CNN-for predicting drug susceptibility on a global dataset of 36,385 MTB isolates with high-quality phenotypic drug susceptibility testing (DST) and WGS data. By integrating the genotypic DST predictions of these five tools in an ensemble machine learning framework, we developed an improved computational model for MTB drug susceptibility prediction. We then validated the ensemble model on 860 MTB isolates with phenotypic and WGS data collected in Shenzhen, China (2013-2019) and Valencia, Spain (2014-2016).
Among the five genotypic DST tools for predicting susceptibility to ten drugs, MD-CNN exhibited the highest overall performance (AUC 92.1%; 95% CI 89.8-94.4%). The WHO catalog demonstrated the highest specificity of 97.3% (95% CI 95.8-98.4%), while TB Profiler had the best sensitivity at 79.5% (95% CI 71.8-86.2%). The ensemble machine learning model (AUC 93.4%; 95% CI 91.4-95.4%) outperformed all of the five individual tools, with a specificity of 95.4% (95% CI 93.0-97.6%) and a sensitivity of 84.1% (95% CI 78.8-88.8%), principally due to considerable improvements in second-line drug resistance predictions (AUC 91.8%; 95% CI 89.6-94.0%).
The second edition of the WHO MTB mutation catalog does not, by itself, perform better than existing tools for predicting MTB drug resistance. An integrative approach combining the WHO catalog with other genotypic DST methods significantly enhances prediction accuracy.
世界卫生组织(WHO)最近发布了第二版用于预测结核分枝杆菌(MTB)耐药性的突变目录。本研究将其与现有的基于全基因组测序(WGS)的预测方法进行了有效性评估,并提出了一种优化该目录的新方法。
我们在一个包含36385株MTB分离株的全球数据集中,使用五种工具——WHO目录、TB Profiler、SAM-TB、GenTB和MD-CNN,结合高质量的表型药物敏感性试验(DST)和WGS数据,测试它们预测药物敏感性的准确性。通过在一个集成机器学习框架中整合这五种工具的基因型DST预测结果,我们开发了一种改进的MTB药物敏感性预测计算模型。然后,我们在中国深圳(2013 - 2019年)和西班牙巴伦西亚(2014 - 2016年)收集的860株具有表型和WGS数据的MTB分离株上验证了该集成模型。
在用于预测对十种药物敏感性的五种基因型DST工具中,MD-CNN表现出最高的总体性能(曲线下面积[AUC]为92.1%;95%置信区间[CI]为89.8 - 94.4%)。WHO目录显示出最高的特异性,为97.3%(95% CI为95.8 - 98.4%),而TB Profiler的敏感性最佳,为79.5%(95% CI为71.8 - 86.2%)。集成机器学习模型(AUC为93.4%;95% CI为91.4 - 95.4%)的表现优于所有这五种单独的工具,其特异性为95.4%(95% CI为93.0 - 97.6%),敏感性为84.1%(95% CI为78.8 - 88.8%),这主要得益于二线药物耐药性预测方面的显著改进(AUC为91.8%;95% CI为89.6 - 94.0%)。
WHO MTB突变目录第二版本身在预测MTB耐药性方面并不比现有工具表现更好。将WHO目录与其他基因型DST方法相结合的综合方法可显著提高预测准确性。