Harigua-Souiai Emna, Masmoudi Ons, Makni Samer, Oualha Rafeh, Abdelkrim Yosser Z, Hamdi Sara, Souiai Oussama, Guizani Ikram
Laboratory of Molecular Epidemiology and Experimental Pathology - LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, 13, Place Pasteur, 1002, Tunis, Tunisia.
Laboratory of BioInformatics, BioMathematics and BioStatistics - LR20IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, 13, Place Pasteur, 1002, Tunis, Tunisia.
J Cheminform. 2024 Nov 28;16(1):134. doi: 10.1186/s13321-024-00929-7.
Computer-aided drug discovery (CADD) is nurtured by late advances in big data analytics and Artificial Intelligence (AI) towards enhanced drug discovery (DD) outcomes. In this context, reliable datasets are of utmost importance. We herein present CidalsDB a novel web server for AI-assisted DD against infectious pathogens, namely Leishmania parasites and Coronaviruses. We performed a literature search on molecules with validated anti-pathogen effects. Then, we consolidated these data with bioassays from PubChem. Finally, we constructed a database to store these datasets and make them accessible and ready-to-use for the scientific community through CidalsDB, a web-based interface. In a second step, we implemented and optimized four machine learning (ML) and three deep learning (DL) algorithms that optimally predicted the biological activity of molecules. Random Forests (RF), Multi-Layer Perceptron (MLP) and ChemBERTa were the best classifiers of anti-Leishmania molecules, while Gradient Boosting (GB), Graph-Convolutional Network (GCN) and ChemBERTa achieved the best performances on the Coronaviruses dataset. All six models were optimized and deployed through CidalsDB as anti-pathogen activity prediction models.Scientific contributionCidalsDB is an open access web-based tool that allows browsing and access to ready-to-use datasets of anti-pathogen molecules, alongside best performing AI models for biological activity prediction. It offers a democratized no-code platform for AI-based CADD, which shall foster innovation and collaboration within the DD community. CidalsDB is accessible through https://cidalsdb.streamlit.app/ .
计算机辅助药物发现(CADD)受益于大数据分析和人工智能(AI)的最新进展,以提高药物发现(DD)的成果。在这种背景下,可靠的数据集至关重要。我们在此介绍CidalsDB,这是一个新型网络服务器,用于针对感染性病原体(即利什曼原虫和冠状病毒)进行人工智能辅助的药物发现。我们对具有经过验证的抗病原体作用的分子进行了文献检索。然后,我们将这些数据与来自PubChem的生物测定数据进行整合。最后,我们构建了一个数据库来存储这些数据集,并通过基于网络的界面CidalsDB使其可供科学界访问并随时使用。在第二步中,我们实施并优化了四种机器学习(ML)算法和三种深度学习(DL)算法,这些算法能够最佳地预测分子的生物活性。随机森林(RF)、多层感知器(MLP)和ChemBERTa是抗利什曼原虫分子的最佳分类器,而梯度提升(GB)、图卷积网络(GCN)和ChemBERTa在冠状病毒数据集上表现最佳。所有六个模型都通过CidalsDB进行了优化和部署,作为抗病原体活性预测模型。
科学贡献
CidalsDB是一个基于网络的开放获取工具,它允许浏览和访问抗病原体分子的现成数据集,以及用于生物活性预测的性能最佳的人工智能模型。它为基于人工智能的CADD提供了一个无需编码的民主化平台,这将促进药物发现社区内的创新与合作。可通过https://cidalsdb.streamlit.app/访问CidalsDB。