College of Pharmaceutical Sciences at Zhejiang University, China.
College of Pharmaceutical Sciences, Zhejiang University, China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab068.
Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb) and it has been one of the top 10 causes of death globally. Drug-resistant tuberculosis (XDR-TB), extensively resistant to the commonly used first-line drugs, has emerged as a major challenge to TB treatment. Hence, it is quite necessary to discover novel drug candidates for TB treatment. In this study, based on different types of molecular representations, four machine learning (ML) algorithms, including support vector machine, random forest (RF), extreme gradient boosting (XGBoost) and deep neural networks (DNN), were used to develop classification models to distinguish Mtb inhibitors from noninhibitors. The results demonstrate that the XGBoost model exhibits the best prediction performance. Then, two consensus strategies were employed to integrate the predictions from multiple models. The evaluation results illustrate that the consensus model by stacking the RF, XGBoost and DNN predictions offers the best predictions with area under the receiver operating characteristic curve of 0.842 and 0.942 for the 10-fold cross-validated training set and external test set, respectively. Besides, the association between the important descriptors and the bioactivities of molecules was interpreted by using the Shapley additive explanations method. Finally, an online webserver called ChemTB (http://cadd.zju.edu.cn/chemtb/) was developed, and it offers a freely available computational tool to detect potential Mtb inhibitors.
结核病(TB)是由结核分枝杆菌(Mtb)引起的传染病,它一直是全球十大死因之一。耐多药结核病(XDR-TB)对常用的一线药物广泛耐药,已成为结核病治疗的主要挑战。因此,发现治疗结核病的新型药物候选物是非常必要的。在这项研究中,基于不同类型的分子表示,我们使用了四种机器学习(ML)算法,包括支持向量机、随机森林(RF)、极端梯度提升(XGBoost)和深度神经网络(DNN),来开发分类模型,以区分结核分枝杆菌抑制剂和非抑制剂。结果表明,XGBoost 模型表现出最佳的预测性能。然后,我们采用了两种共识策略来整合来自多个模型的预测结果。评估结果表明,通过堆叠 RF、XGBoost 和 DNN 预测结果的共识模型提供了最佳的预测,在 10 倍交叉验证训练集和外部测试集上的接收器操作特征曲线下面积分别为 0.842 和 0.942。此外,还使用 Shapley 加法解释方法解释了重要描述符与分子生物活性之间的关系。最后,我们开发了一个名为 ChemTB(http://cadd.zju.edu.cn/chemtb/)的在线网络服务器,并提供了一个免费的计算工具来检测潜在的结核分枝杆菌抑制剂。