Harigua-Souiai Emna, Oualha Rafeh, Souiai Oussama, Abdeljaoued-Tej Ines, Guizani Ikram
Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia.
Laboratory of Bioinformatics, BioMathematics and BioStatistics LR20IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia.
Bioinform Biol Insights. 2022 Apr 22;16:11779322221090349. doi: 10.1177/11779322221090349. eCollection 2022.
Drug discovery (DD) research is a complex field with a high attrition rate. Machine learning (ML) approaches combined to chemoinformatics are of valuable input to this field. We, herein, focused on implementing multiple ML algorithms that shall learn from different molecular fingerprints (FPs) of 65 057 molecules that have been identified as active or inactive against promastigotes. We sought to build a classifier able to predict whether a given molecule has the potential of being anti-leishmanial or not. Using the RDkit library, we calculated 5 molecular FPs of the molecules. Then, we implemented 4 ML algorithms that we trained and tested for their ability to classify the molecules into active/inactive classes based on their chemical structure, encoded by the molecular FPs. Best performers were random forest (RF) and support vector machine (SVM), while atom-pair and topology torsion FPs were the best embedding functions. Both models were further assessed on different stratification levels of the dataset and showed stable performances. At last, we used them to predict the potential of molecules within the Food and Drug Administration (FDA)-approved drugs collection to present anti- effects. We ranked these drugs according to their anti-Leishmanial probability and obtained in total seven anti- agents, previously described in the literature, within the top 10 of each model. This validates the robustness of the approach, the algorithms, and FPs choices as well as the importance of the dataset size and content. We further engaged these molecules into reverse docking experiments on 3D crystal structures of seven well-studied Leishmania drug targets and could predict the molecular targets for 4 drugs. The results bring novel insights into anti-Leishmania compounds.
药物发现(DD)研究是一个复杂的领域,淘汰率很高。结合化学信息学的机器学习(ML)方法对该领域具有重要价值。在此,我们专注于实施多种ML算法,这些算法将从65057个已被确定对前鞭毛体有活性或无活性的分子的不同分子指纹(FPs)中学习。我们试图构建一个分类器,能够预测给定分子是否具有抗利什曼原虫的潜力。使用RDkit库,我们计算了这些分子的5种分子FPs。然后,我们实施了4种ML算法,并对其进行训练和测试,以评估它们根据由分子FPs编码的化学结构将分子分类为活性/非活性类别的能力。表现最佳的是随机森林(RF)和支持向量机(SVM),而原子对和拓扑扭转FPs是最佳的嵌入函数。这两种模型在数据集的不同分层水平上进一步评估,并表现出稳定的性能。最后,我们使用它们来预测美国食品药品监督管理局(FDA)批准的药物集合中分子的抗利什曼原虫潜力。我们根据它们的抗利什曼原虫概率对这些药物进行排名,在每个模型的前10名中总共获得了7种先前文献中描述的抗利什曼原虫药物。这验证了该方法、算法和FPs选择的稳健性以及数据集大小和内容的重要性。我们进一步将这些分子用于对7个经过充分研究的利什曼原虫药物靶点的3D晶体结构进行反向对接实验,并能够预测4种药物的分子靶点。这些结果为抗利什曼原虫化合物带来了新的见解。