Department of Microbiology, University of Manitoba, Winnipeg, Manitoba, Canada.
Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, Manitoba, Canada.
PLoS Comput Biol. 2022 Oct 13;18(10):e1010613. doi: 10.1371/journal.pcbi.1010613. eCollection 2022 Oct.
Screening for novel antibacterial compounds in small molecule libraries has a low success rate. We applied machine learning (ML)-based virtual screening for antibacterial activity and evaluated its predictive power by experimental validation. We first binarized 29,537 compounds according to their growth inhibitory activity (hit rate 0.87%) against the antibiotic-resistant bacterium Burkholderia cenocepacia and described their molecular features with a directed-message passing neural network (D-MPNN). Then, we used the data to train an ML model that achieved a receiver operating characteristic (ROC) score of 0.823 on the test set. Finally, we predicted antibacterial activity in virtual libraries corresponding to 1,614 compounds from the Food and Drug Administration (FDA)-approved list and 224,205 natural products. Hit rates of 26% and 12%, respectively, were obtained when we tested the top-ranked predicted compounds for growth inhibitory activity against B. cenocepacia, which represents at least a 14-fold increase from the previous hit rate. In addition, more than 51% of the predicted antibacterial natural compounds inhibited ESKAPE pathogens showing that predictions expand beyond the organism-specific dataset to a broad range of bacteria. Overall, the developed ML approach can be used for compound prioritization before screening, increasing the typical hit rate of drug discovery.
在小分子库中筛选新型抗菌化合物的成功率较低。我们应用基于机器学习 (ML) 的虚拟筛选来评估抗菌活性,并通过实验验证来评估其预测能力。我们首先根据 29,537 种化合物对耐抗生素细菌洋葱伯克霍尔德菌的生长抑制活性(命中率为 0.87%)将其二值化,并使用定向消息传递神经网络 (D-MPNN) 描述其分子特征。然后,我们使用这些数据来训练一个 ML 模型,该模型在测试集上的接收者操作特征 (ROC) 得分为 0.823。最后,我们预测了来自美国食品和药物管理局 (FDA) 批准清单的 1,614 种化合物和 224,205 种天然产物的虚拟库中的抗菌活性。当我们测试针对 B. cenocepacia 的生长抑制活性的排名靠前的预测化合物时,命中率分别为 26%和 12%,这至少比以前的命中率提高了 14 倍。此外,超过 51%的预测抗菌天然化合物抑制了 ESKAPE 病原体,这表明预测结果不仅限于特定生物体的数据集,而是扩展到了广泛的细菌范围。总的来说,开发的 ML 方法可用于筛选前的化合物优先级排序,提高药物发现的典型命中率。