School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
School of Civil Engineering and Transportation, South China University of Technology, Guangzhou, 510006, China.
Eur J Med Chem. 2024 Nov 5;277:116776. doi: 10.1016/j.ejmech.2024.116776. Epub 2024 Aug 16.
Malaria remains a significant global health challenge due to the growing drug resistance of Plasmodium parasites and the failure to block transmission within human host. While machine learning (ML) and deep learning (DL) methods have shown promise in accelerating antimalarial drug discovery, the performance of deep learning models based on molecular graph and other co-representation approaches warrants further exploration. Current research has overlooked mutant strains of the malaria parasite with varying degrees of sensitivity or resistance, and has not covered the prediction of inhibitory activities across the three major life cycle stages (liver, asexual blood, and gametocyte) within the human host, which is crucial for both treatment and transmission blocking. In this study, we manually curated a benchmark antimalarial activity dataset comprising 407,404 unique compounds and 410,654 bioactivity data points across ten Plasmodium phenotypes and three stages. The performance was systematically compared among two fingerprint-based ML models (RF::Morgan and XGBoost:Morgan), four graph-based DL models (GCN, GAT, MPNN, and Attentive FP), and three co-representations DL models (FP-GNN, HiGNN, and FG-BERT), which reveal that: 1) The FP-GNN model achieved the best predictive performance, outperforming the other methods in distinguishing active and inactive compounds across balanced, more positive, and more negative datasets, with an overall AUROC of 0.900; 2) Fingerprint-based ML models outperformed graph-based DL models on large datasets (>1000 compounds), but the three co-representations DL models were able to incorporate domain-specific chemical knowledge to bridge this gap, achieving better predictive performance. These findings provide valuable guidance for selecting appropriate ML and DL methods for antimalarial activity prediction tasks. The interpretability analysis of the FP-GNN model revealed its ability to accurately capture the key structural features responsible for the liver- and blood-stage activities of the known antimalarial drug atovaquone. Finally, we developed a web server, MalariaFlow, incorporating these high-quality models for antimalarial activity prediction, virtual screening, and similarity search, successfully predicting novel triple-stage antimalarial hits validated through experimental testing, demonstrating its effectiveness and value in discovering potential multistage antimalarial drug candidates.
疟疾仍然是一个重大的全球健康挑战,原因是疟原虫寄生虫的耐药性不断增强,以及未能阻止人类宿主内的传播。虽然机器学习 (ML) 和深度学习 (DL) 方法在加速抗疟药物发现方面显示出了前景,但基于分子图和其他共同表示方法的深度学习模型的性能仍需要进一步探索。目前的研究忽略了具有不同程度敏感性或耐药性的疟原虫突变株,也没有涵盖人类宿主内三个主要生命周期阶段(肝、无性血和配子体)的抑制活性预测,这对于治疗和传播阻断都至关重要。在这项研究中,我们手动整理了一个基准抗疟活性数据集,其中包含 407404 种独特化合物和 410654 个生物活性数据点,涵盖了十种疟原虫表型和三个阶段。我们系统地比较了两种基于指纹的 ML 模型(RF::Morgan 和 XGBoost:Morgan)、四种基于图的 DL 模型(GCN、GAT、MPNN 和 Attentive FP)和三种基于共同表示的 DL 模型(FP-GNN、HiGNN 和 FG-BERT)的性能,结果表明:1)FP-GNN 模型在区分平衡、更积极和更消极数据集的活性和非活性化合物方面取得了最佳的预测性能,整体 AUROC 为 0.900;2)在大型数据集(>1000 种化合物)上,基于指纹的 ML 模型优于基于图的 DL 模型,但三种基于共同表示的 DL 模型能够利用特定领域的化学知识来缩小这一差距,从而实现更好的预测性能。这些发现为选择合适的 ML 和 DL 方法进行抗疟活性预测任务提供了有价值的指导。FP-GNN 模型的可解释性分析表明,它能够准确捕捉到已知抗疟药物阿托伐醌对肝期和血期活性的关键结构特征。最后,我们开发了一个网络服务器 MalariaFlow,其中整合了这些高质量的模型,用于抗疟活性预测、虚拟筛选和相似性搜索,成功预测了经过实验验证的新型三重阶段抗疟命中化合物,证明了它在发现潜在多阶段抗疟药物候选物方面的有效性和价值。