Suppr超能文献

预测……中的耐药性:一种用于基因组突变分析的机器学习方法

Predicting Drug Resistance in : A Machine Learning Approach to Genomic Mutation Analysis.

作者信息

Paredes-Gutierrez Guillermo, Perea-Jacobo Ricardo, Acosta-Mesa Héctor-Gabriel, Mezura-Montes Efren, Morales Reyes José Luis, Zenteno-Cuevas Roberto, Guerrero-Chevannier Miguel-Ángel, Muñiz-Salazar Raquel, Flores Dora-Luz

机构信息

Facultad de Ingeniería Arquitectura y Diseño, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22860, Mexico.

Escuela de Ciencias de la Salud, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22890, Mexico.

出版信息

Diagnostics (Basel). 2025 Jan 24;15(3):279. doi: 10.3390/diagnostics15030279.

Abstract

Tuberculosis (TB), caused by (), remains a leading cause of death from infectious diseases globally. The treatment of active TB relies on first- and second-line drugs, however, the emergence of drug resistance poses a significant challenge to global TB control efforts. Recent advances in whole-genome sequencing combined with machine learning have shown promise in predicting drug resistance. This study aimed to evaluate the performance of four machine learning models in classifying resistance to ethambutol, isoniazid, and rifampicin in isolates. Four machine learning models-Extreme Gradient Boosting Classifier (XGBC), Logistic Gradient Boosting Classifier (LGBC), Gradient Boosting Classifier (GBC), and an Artificial Neural Network (ANN)-were trained using a Variant Call Format (VCF) dataset preprocessed by the CRyPTIC consortium. Three datasets were used: the original dataset, a principal component analysis (PCA)-reduced dataset, and a dataset prioritizing significant mutations identified by the XGBC model. The models were trained and tested across these datasets, and their performance was compared using sensitivity, specificity, Precision, F1-scores and Accuracy. All models were applied to the PCA-reduced dataset, while the XGBC model was also evaluated using the mutation-prioritized dataset. The XGBC model trained on the original dataset outperformed the others, achieving sensitivity values of 0.97, 0.90, and 0.94; specificity values of 0.97, 0.99, and 0.96; and F1-scores of 0.93, 0.94, and 0.92 for ethambutol, isoniazid, and rifampicin, respectively. These results demonstrate the superior accuracy of the XGBC model in classifying drug resistance. The study highlights the effectiveness of using a binary representation of mutations to train the XGBC model for predicting resistance and susceptibility to key TB drugs. The XGBC model trained on the original dataset demonstrated the highest performance among the evaluated models, suggesting its potential for clinical application in combating drug-resistant tuberculosis. Further research is needed to validate and expand these findings for broader implementation in TB diagnostics.

摘要

由()引起的结核病(TB)仍是全球传染病死亡的主要原因。活动性结核病的治疗依赖于一线和二线药物,然而,耐药性的出现对全球结核病控制工作构成了重大挑战。全基因组测序与机器学习相结合的最新进展在预测耐药性方面显示出了前景。本研究旨在评估四种机器学习模型对分离株中乙胺丁醇、异烟肼和利福平耐药性进行分类的性能。使用由CRYPTIC联盟预处理的变异调用格式(VCF)数据集训练了四种机器学习模型——极端梯度提升分类器(XGBC)、逻辑梯度提升分类器(LGBC)、梯度提升分类器(GBC)和人工神经网络(ANN)。使用了三个数据集:原始数据集、主成分分析(PCA)降维数据集以及优先考虑XGBC模型识别出的显著突变的数据集。在这些数据集上对模型进行训练和测试,并使用敏感性、特异性、精确率、F1分数和准确率比较它们的性能。所有模型都应用于PCA降维数据集,同时还使用突变优先数据集对XGBC模型进行了评估。在原始数据集上训练的XGBC模型表现优于其他模型,乙胺丁醇、异烟肼和利福平的敏感性值分别为0.97、0.90和0.94;特异性值分别为0.97、0.99和0.96;F1分数分别为0.93、0.94和0.92。这些结果证明了XGBC模型在分类耐药性方面具有更高的准确性。该研究突出了使用突变的二元表示来训练XGBC模型以预测对关键结核病药物的耐药性和敏感性的有效性。在原始数据集上训练的XGBC模型在评估模型中表现出最高性能,表明其在对抗耐药结核病的临床应用中的潜力。需要进一步研究来验证和扩展这些发现,以便在结核病诊断中更广泛地应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2768/11817661/89c60b07968d/diagnostics-15-00279-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验