Suppr超能文献

基于机器学习的结核分枝杆菌抗菌药物耐药性预测及与耐药相关单核苷酸多态性的鉴定

Machine learning-based prediction of antimicrobial resistance and identification of AMR-related SNPs in Mycobacterium tuberculosis.

作者信息

Xu Yi, Mao Ying, Hua Xiaoting, Jiang Yan, Zou Yi, Wang Zhichao, Liu Zubi, Zhang Hongrui, Lu Lingling, Yu Yunsong

机构信息

Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, 310016, China.

Key Laboratory of Digital Technology in Medical Diagnostics of Zhejiang Province, Dian Diagnostics Group Co, Ltd, Hangzhou, 310030, China.

出版信息

BMC Genom Data. 2025 Jul 12;26(1):48. doi: 10.1186/s12863-025-01338-x.

Abstract

BACKGROUND

Mycobacterium tuberculosis (MTB) is a human-specific pathogen that primarily infects humans, causing tuberculosis (TB). Antimicrobial resistance (AMR) in MTB presents a formidable challenge to global health. The employment of machine learning on whole-genome sequencing data (WGS) presents significant potential for uncovering the genomic mechanisms underlying drug resistance in MTB.

METHODS

We used 18 binary matrices, each consisting of genotypes and antimicrobial susceptibility testing phenotypes from a specific MTB-antimicrobial dataset. By constructing training and test datasets on all SNPs, intersected SNPs, and randomly generated SNPs, we developed a Machine learning (ML) framework using twelve different algorithms. Then, we compared the performances of the various ML models and used the SHapley Additive exPlanations (SHAP) framework to decipher why and how decisions are made within the optimal algorithm. Lastly, we applied the models to predict the resistance phenotype to rifampicin (RIF) and isoniazid (INH) in the additional independent MTB isolate datasets from India and Israel.

RESULTS

In our study, the Gradient Boosting Classifier (GBC) model was the best in terms of correctly identified percentages (97.28%, 96.06%, 94.19%, and 92.81% for the four first-line drugs, RIF, INH, pyrazinamide, and ethambutol respectively). By estimating the contributions of AMR-related SNPs by SHAP values, we found that position 761,155 (rpoB_p.Ser450), 2,155,168 (katG_p.Ser315) rank top in RIF and INH, their higher values (1 for alternative allele) tend to predict the resistance trait for these two drugs. In addition, the best model GBC generalizes well in predicting the resistance phenotypes for RIF and INH in the external independent MTB isolate datasets from India and Israel.

CONCLUSIONS

This study integrates ML methods into antimicrobial resistance research, develops a framework for predicting resistance phenotypes, and explores AMR-related SNPs in MTB. Quantifying the important SNPs' contribution to model decisions makes the ML algorithmic process more transparent, interpretable enabling and enables clinical practice.

摘要

背景

结核分枝杆菌(MTB)是一种主要感染人类并导致结核病(TB)的人类特异性病原体。MTB中的抗菌药物耐药性(AMR)对全球健康构成了巨大挑战。将机器学习应用于全基因组测序数据(WGS)在揭示MTB耐药性的基因组机制方面具有巨大潜力。

方法

我们使用了18个二元矩阵,每个矩阵由来自特定MTB-抗菌药物数据集的基因型和抗菌药物敏感性测试表型组成。通过在所有单核苷酸多态性(SNPs)、交叉SNPs和随机生成的SNPs上构建训练和测试数据集,我们使用十二种不同算法开发了一个机器学习(ML)框架。然后,我们比较了各种ML模型的性能,并使用SHapley加性解释(SHAP)框架来解读在最优算法中决策是如何以及为何做出的。最后,我们将模型应用于预测来自印度和以色列的额外独立MTB分离株数据集中对利福平(RIF)和异烟肼(INH)的耐药表型。

结果

在我们的研究中,梯度提升分类器(GBC)模型在正确识别百分比方面表现最佳(四种一线药物RIF、INH、吡嗪酰胺和乙胺丁醇分别为97.28%、96.06%、94.19%和92.81%)。通过用SHAP值估计与AMR相关的SNPs的贡献,我们发现位置761,155(rpoB_p.Ser450)、2,155,168(katG_p.Ser315)在RIF和INH中排名靠前,它们较高的值(替代等位基因为1)倾向于预测这两种药物的耐药性状。此外,最佳模型GBC在预测来自印度和以色列的外部独立MTB分离株数据集中RIF和INH的耐药表型方面具有良好的泛化能力。

结论

本研究将ML方法整合到抗菌药物耐药性研究中,开发了一个预测耐药表型的框架,并探索了MTB中与AMR相关的SNPs。量化重要SNPs对模型决策的贡献使ML算法过程更加透明、可解释,从而有助于临床实践。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b365/12255030/6273f396f52b/12863_2025_1338_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验