Laboratory of Systems Biology and Bioinformatics (LBB), Department of Bioinformatics, Kish International Campus, University of Tehran, Kish Island, Iran.
Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
Mol Divers. 2021 May;25(2):827-838. doi: 10.1007/s11030-020-10065-7. Epub 2020 Mar 19.
The advent of computational methods for efficient prediction of the druglikeness of small molecules and their ever-burgeoning applications in the fields of medicinal chemistry and drug industries have been a profound scientific development, since only a few amounts of the small molecule libraries were identified as approvable drugs. In this study, a deep belief network was utilized to construct a druglikeness classification model. For this purpose, small molecules and approved drugs from the ZINC database were selected for the unsupervised pre-training step and supervised training step. Various binary fingerprints such as Macc 166 bit, PubChem 881 bit, and Morgan 2048 bit as data features were investigated. The report revealed that using an unsupervised pre-training phase can lead to a good performance model and generalizability capability. Accuracy, precision, and recall of the model for Macc features were 97%, 96%, and 99%, respectively. For more consideration about the generalizability of the model, the external data by expression and investigational drugs in drug banks as drug data and randomly selected data from the ZINC database as non-drug were created. The results confirmed the good performance and generalizability capability of the model. Also, the outcomes depicted that a large proportion of misclassified non-drug small molecules ascertain the bioavailability conditions and could be investigated as a drug in the future. Furthermore, our model attempted to tap potential opportunities as a drug filter in drug discovery.
小分子药物类药性的高效预测计算方法的出现及其在药物化学和制药行业的不断涌现的应用是一个深远的科学发展,因为只有少数小分子库被确定为可批准药物。在这项研究中,使用深度置信网络来构建药物分类模型。为此,选择了来自 ZINC 数据库的小分子和批准药物进行无监督预训练步骤和监督训练步骤。研究考察了各种二进制指纹,如 Macc 166 位、PubChem 881 位和 Morgan 2048 位作为数据特征。报告显示,使用无监督预训练阶段可以得到性能良好的模型和泛化能力。对于 Macc 特征,模型的准确性、精确性和召回率分别为 97%、96%和 99%。为了更好地考虑模型的泛化能力,我们创建了药物数据库中的表达和研究药物的外部数据作为药物数据,以及从 ZINC 数据库中随机选择的数据作为非药物数据。结果证实了该模型的良好性能和泛化能力。此外,结果表明,大量被错误分类的非药物小分子确定了生物利用度条件,将来可能会被作为药物进行研究。此外,我们的模型试图挖掘药物发现中作为药物筛选器的潜在机会。