Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S. A. S. Nagar, Punjab, 160 062, India.
Mol Divers. 2024 Aug;28(4):2317-2329. doi: 10.1007/s11030-024-10952-3. Epub 2024 Aug 12.
Tuberculosis (TB) caused by the bacteria Mycobacterium tuberculosis (M. tb), continues to pose a significant worldwide health threat. The advent of drug-resistant strains of the disease highlights the critical need for novel treatments. The unique cell wall of M. tb provides an extra layer of protection for the bacteria and hence only compounds that can penetrate this barrier can reach their targets within the bacterial cell wall. The creation of a reliable machine learning (ML) model to predict the mycobacterial cell wall permeability of small molecules is presented in this work and four ML algorithms, including Random Forest, Support Vector Machines (SVM), k-nearest Neighbour (k-NN) and Logistic Regression were trained on a dataset of 5368 compounds. RDKit and Mordred toolkits were used to calculate features. To determine the most effective model, various performance metrics were used such as accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve. The best-performing model was further refined with hyperparameter tuning and tenfold cross-validation. The SVM model with filtering outperformed the other machine learning models and demonstrated 80.26% and 81.13% accuracy on the test and validation datasets, respectively. The study also provided insights into the molecular descriptors that play the most important role in predicting the ability of a molecule to pass the M. tb cell wall, which could guide future compound design. The model is available at https://github.com/PGlab-NIPER/MTB_Permeability .
结核分枝杆菌(M. tb)引起的结核病仍然是全球健康的重大威胁。耐药菌株的出现突显了对新型治疗方法的迫切需求。M. tb 的独特细胞壁为细菌提供了额外的保护,因此只有能够穿透这层屏障的化合物才能到达细胞壁内的靶标。本研究提出了一种用于预测小分子对分枝杆菌细胞壁渗透性的可靠机器学习(ML)模型,使用包括随机森林、支持向量机(SVM)、k-最近邻(k-NN)和逻辑回归在内的四种 ML 算法,对 5368 种化合物的数据集进行了训练。使用 RDKit 和 Mordred 工具包来计算特征。为了确定最有效的模型,使用了各种性能指标,如准确性、精度、召回率、F1 分数和接收者操作特征曲线下的面积。通过超参数调整和十折交叉验证对表现最佳的模型进行了进一步优化。具有过滤功能的 SVM 模型优于其他机器学习模型,在测试集和验证集上的准确率分别为 80.26%和 81.13%。该研究还深入探讨了在预测分子穿过 M. tb 细胞壁的能力方面起最重要作用的分子描述符,这可能为未来的化合物设计提供指导。该模型可在 https://github.com/PGlab-NIPER/MTB_Permeability 上获取。