Ranjan Amit, Shukla Shivansh, Datta Deepanjan, Misra Rajiv
Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, 801103 India.
Netw Model Anal Health Inform Bioinform. 2022;11(1):6. doi: 10.1007/s13721-021-00351-1. Epub 2021 Dec 18.
The transmittable spread of viral coronavirus (SARS-CoV-2) has resulted in a significant rise in global mortality. Due to lack of effective treatment, our aim is to generate a highly potent active molecule that can bind with the protein structure of SARS-CoV-2. Different machine learning and deep learning approaches have been proposed for molecule generation; however, most of these approaches represent the drug molecule and protein structure in 1D sequence, ignoring the fact that molecules are by nature in 3D structure, and because of this many critical properties are lost. In this work, a framework is proposed that takes account of both tertiary and sequential representations of molecules and proteins using Gated Graph Neural Network (GGNN), Knowledge graph, and Early Fusion approach. The generated molecules from GGNN are screened using Knowledge Graph to reduce the search space by discarding the non-binding molecules before being fed into the Early Fusion model. Further, the binding affinity score of the generated molecule is predicted using the early fusion approach. Experimental result shows that our framework generates valid and unique molecules with high accuracy while preserving the chemical properties. The use of a knowledge graph claims that the entire generated dataset of molecules was reduced by roughly 96% while retaining more than 85% of good binding desirable molecules and the rejection of more than 99% of fruitless molecules. Additionally, the framework was tested with two of the SARS-CoV-2 viral proteins: RNA-dependent-RNA polymerase (RdRp) and 3C-like protease (3CLpro).
病毒性冠状病毒(SARS-CoV-2)的可传播性传播导致全球死亡率显著上升。由于缺乏有效的治疗方法,我们的目标是生成一种高效的活性分子,该分子能够与SARS-CoV-2的蛋白质结构结合。已经提出了不同的机器学习和深度学习方法来生成分子;然而,这些方法大多以一维序列表示药物分子和蛋白质结构,忽略了分子本质上是三维结构这一事实,因此许多关键特性丧失。在这项工作中,我们提出了一个框架,该框架使用门控图神经网络(GGNN)、知识图谱和早期融合方法,同时考虑分子和蛋白质的三级结构和序列表示。使用知识图谱对从GGNN生成的分子进行筛选,通过在将非结合分子输入早期融合模型之前将其丢弃来减少搜索空间。此外,使用早期融合方法预测生成分子的结合亲和力得分。实验结果表明,我们的框架能够高精度地生成有效且独特的分子,同时保留化学性质。使用知识图谱表明,整个生成的分子数据集减少了约96%,同时保留了超过85%的具有良好结合性的理想分子,并排除了超过99%的无效分子。此外,该框架还针对两种SARS-CoV-2病毒蛋白进行了测试:RNA依赖性RNA聚合酶(RdRp)和3C样蛋白酶(3CLpro)。