College of Chemical Engineering, Beijing University of Chemical Technology, Beijing 100029, China.
Molecules. 2022 Jul 27;27(15):4807. doi: 10.3390/molecules27154807.
Virtual screening can significantly save experimental time and costs for early drug discovery. Drug multi-classification can speed up virtual screening and quickly predict the most likely class for a drug. In this study, 1019 drug molecules with actual therapeutic effects are collected from multiple databases and documents, and molecular sets are grouped according to therapeutic effect and mechanism of action. Molecular descriptors and molecular fingerprints are obtained through SMILES to quantify molecular structures. After using the Kennard-Stone method to divide the data set, a better combination can be obtained by comparing the combined results of five classification algorithms and a fusion method. Furthermore, for a specific data set, the model with the best performance is used to predict the validation data set. The test set shows that prediction accuracy can reach 0.862 and kappa coefficient can reach 0.808. The highest classification accuracy of the validation set is 0.873. The more reliable molecular set has been found, which could be used to predict potential attributes of unknown drug compounds and even to discover new use for old drugs. We hope this research can provide a reference for virtual screening of multiple classes of drugs at the same time in the future.
虚拟筛选可以显著节省药物早期发现的实验时间和成本。药物多分类可以加速虚拟筛选,并快速预测药物最有可能的类别。在这项研究中,从多个数据库和文献中收集了 1019 种具有实际治疗效果的药物分子,并根据治疗效果和作用机制对分子集进行分组。通过 SMILES 获得分子描述符和分子指纹,以量化分子结构。在使用 Kennard-Stone 方法对数据集进行划分后,通过比较五种分类算法和融合方法的组合结果,可以获得更好的组合。此外,对于特定的数据集,使用性能最佳的模型来预测验证数据集。测试集表明,预测准确率可达 0.862,kappa 系数可达 0.808。验证集的最高分类准确率为 0.873。已经找到了更可靠的分子集,可用于预测未知药物化合物的潜在属性,甚至可以发现旧药物的新用途。我们希望这项研究能为未来同时进行多种药物的虚拟筛选提供参考。