School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
Comput Biol Med. 2021 May;132:104317. doi: 10.1016/j.compbiomed.2021.104317. Epub 2021 Mar 6.
In the context of the recently emerging COVID-19 pandemic, we developed a deep learning model that can be used to predict the inhibitory activity of 3CLpro in severe acute respiratory syndrome coronavirus (SARS-CoV) for unknown compounds during the virtual screening process. This paper proposes a novel deep learning-based method to implement virtual screening with convolutional neural network (CNN) architecture. The descriptors represent chemical molecules, and these descriptors are input into the CNN framework to train a model and predict active compounds. When compared to other machine learning methods, including random forest, naive Bayes, decision tree, and support vector machine, the proposed CNN model's evaluation of the test set showed an accuracy of 0.86, a sensitivity of 0.45, a specificity of 0.96, a precision of 0.73, a recall of 0.45, an F-measure of 0.55, and a ROC of 0.71. The CNN model screened 17 out of 918 phytochemical compounds; 60 out of 423 from the natural product NCI divset IV; 17,831 out of 112,267 from the ZINC natural product database; and 315 out of 1556 FDA-approved drugs as anti-SARS-CoV agents. Further, to prioritize drug-like compounds, Lipinski's rule of five was applied to screen anti-SARS-CoV compounds (excluding FDA-approved drugs), resulting in 10, 59, and 14,025 hit molecules. Out of 10 phytochemical compounds, 9 anti-SARS-CoV agents belonged to the flavonoid group. In conclusion, the proposed CNN model can prove useful for developing novel target-specific anti-SARS-CoV compounds.
在最近爆发的 COVID-19 大流行背景下,我们开发了一种深度学习模型,可用于在虚拟筛选过程中预测未知化合物对严重急性呼吸综合征冠状病毒 (SARS-CoV) 3CLpro 的抑制活性。本文提出了一种基于新型深度学习方法的卷积神经网络 (CNN) 架构虚拟筛选方法。描述符代表化学分子,将这些描述符输入到 CNN 框架中,以训练模型并预测活性化合物。与其他机器学习方法(包括随机森林、朴素贝叶斯、决策树和支持向量机)相比,所提出的 CNN 模型对测试集的评估显示出 0.86 的准确率、0.45 的灵敏度、0.96 的特异性、0.73 的精度、0.45 的召回率、0.55 的 F 度量和 0.71 的 ROC。CNN 模型从 918 种植物化学化合物中筛选出 17 种;从 423 种天然产物 NCI divset IV 中筛选出 60 种;从 112,267 种 ZINC 天然产物数据库中筛选出 17831 种;从 1556 种 FDA 批准药物中筛选出 315 种抗 SARS-CoV 药物。此外,为了优先考虑类药性化合物,应用了 Lipinski 的五规则来筛选抗 SARS-CoV 化合物(不包括 FDA 批准药物),得到了 10、59 和 14025 个命中分子。在 10 种植物化学化合物中,有 9 种抗 SARS-CoV 药物属于黄酮类化合物。总之,所提出的 CNN 模型可用于开发新型针对 SARS-CoV 的靶向化合物。