Santana Marcos V S, Silva-Jr Floriano P
LaBECFar-Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, RJ, 21040-900, Brazil.
BMC Chem. 2021 Feb 2;15(1):8. doi: 10.1186/s13065-021-00737-2.
The global pandemic of coronavirus disease (COVID-19) caused by SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) created a rush to discover drug candidates. Despite the efforts, so far no vaccine or drug has been approved for treatment. Artificial intelligence offers solutions that could accelerate the discovery and optimization of new antivirals, especially in the current scenario dominated by the scarcity of compounds active against SARS-CoV-2. The main protease (M) of SARS-CoV-2 is an attractive target for drug discovery due to the absence in humans and the essential role in viral replication. In this work, we developed a deep learning platform for de novo design of putative inhibitors of SARS-CoV-2 main protease (M). Our methodology consists of 3 main steps: (1) training and validation of general chemistry-based generative model; (2) fine-tuning of the generative model for the chemical space of SARS-CoV- M inhibitors and (3) training of a classifier for bioactivity prediction using transfer learning. The fine-tuned chemical model generated > 90% valid, diverse and novel (not present on the training set) structures. The generated molecules showed a good overlap with M chemical space, displaying similar physicochemical properties and chemical structures. In addition, novel scaffolds were also generated, showing the potential to explore new chemical series. The classification model outperformed the baseline area under the precision-recall curve, showing it can be used for prediction. In addition, the model also outperformed the freely available model Chemprop on an external test set of fragments screened against SARS-CoV-2 Mpro, showing its potential to identify putative antivirals to tackle the COVID-19 pandemic. Finally, among the top-20 predicted hits, we identified nine hits via molecular docking displaying binding poses and interactions similar to experimentally validated inhibitors.
由严重急性呼吸综合征冠状病毒2(SARS-CoV-2)引起的冠状病毒病(COVID-19)全球大流行引发了对候选药物的竞相探索。尽管付出了诸多努力,但迄今为止尚无疫苗或药物获批用于治疗。人工智能提供了一些解决方案,可加速新型抗病毒药物的发现和优化,尤其是在当前以针对SARS-CoV-2的活性化合物稀缺为主导的情况下。SARS-CoV-2的主要蛋白酶(M)是药物发现的一个有吸引力的靶点,因为人类体内不存在该蛋白酶,且它在病毒复制中起着至关重要的作用。在这项工作中,我们开发了一个深度学习平台,用于从头设计SARS-CoV-2主要蛋白酶(M)的假定抑制剂。我们的方法包括3个主要步骤:(1)基于通用化学的生成模型的训练和验证;(2)针对SARS-CoV-M抑制剂的化学空间对生成模型进行微调;(3)使用迁移学习训练用于生物活性预测的分类器。经过微调的化学模型生成了超过90%有效的、多样的和新颖的(未出现在训练集中)结构。生成的分子与M的化学空间有很好的重叠,显示出相似的物理化学性质和化学结构。此外,还生成了新型骨架,显示出探索新化学系列的潜力。分类模型在精确召回曲线下的面积优于基线,表明它可用于预测。此外,在针对SARS-CoV-2 Mpro筛选的片段外部测试集上,该模型也优于免费可用的Chemprop模型,显示出其识别假定抗病毒药物以应对COVID-19大流行的潜力。最后,在排名前20的预测命中物中,我们通过分子对接确定了9个命中物,它们显示出与经实验验证的抑制剂相似的结合姿态和相互作用。