Grow Christopher, Gao Kaifu, Nguyen Duc Duy, Wei Guo-Wei
Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA.
Commun Inf Syst. 2019;19(3):241-277. doi: 10.4310/cis.2019.v19.n3.a2.
It remains a challenging task to generate a vast variety of novel compounds with desirable pharmacological properties. In this work, a generative network complex (GNC) is proposed as a new platform for designing novel compounds, predicting their physical and chemical properties, and selecting potential drug candidates that fulfill various druggable criteria such as binding affinity, solubility, partition coefficient, etc. We combine a SMILES string generator, which consists of an encoder, a drug-property controlled or regulated latent space, and a decoder, with verification deep neural networks, a target-specific three-dimensional (3D) pose generator, and mathematical deep learning networks to generate new compounds, predict their drug properties, construct 3D poses associated with target proteins, and reevaluate druggability, respectively. New compounds were generated in the latent space by either randomized output, controlled output, or optimized output. In our demonstration, 2.08 million and 2.8 million novel compounds are generated respectively for Cathepsin S and BACE targets. These new compounds are very different from the seeds and cover a larger chemical space. For potentially active compounds, their 3D poses are generated using a state-of-the-art method. The resulting 3D complexes are further evaluated for druggability by a championing deep learning algorithm based on algebraic topology, differential geometry, and algebraic graph theories. Performed on supercomputers, the whole process took less than one week. Therefore, our GNC is an efficient new paradigm for discovering new drug candidates.
生成具有理想药理特性的大量新型化合物仍然是一项具有挑战性的任务。在这项工作中,提出了一种生成网络复合体(GNC)作为设计新型化合物、预测其物理和化学性质以及选择满足各种可成药标准(如结合亲和力、溶解度、分配系数等)的潜在药物候选物的新平台。我们将一个由编码器、药物性质控制或调节的潜在空间和解码器组成的SMILES字符串生成器与验证深度神经网络、目标特异性三维(3D)构象生成器和数学深度学习网络相结合,分别用于生成新化合物、预测其药物性质、构建与靶蛋白相关的3D构象以及重新评估可成药性。通过随机输出、控制输出或优化输出在潜在空间中生成新化合物。在我们的演示中,分别为组织蛋白酶S和β-分泌酶(BACE)靶点生成了208万和280万种新型化合物。这些新化合物与种子化合物有很大不同,并且覆盖了更大的化学空间。对于潜在活性化合物,使用一种先进的方法生成其3D构象。基于代数拓扑、微分几何和代数图论的先进深度学习算法进一步评估所得3D复合物的可成药性。在超级计算机上进行,整个过程耗时不到一周。因此,我们的GNC是发现新药物候选物的一种高效新范式。