SoftServe, Inc, Lviv, Ukraine.
Institute for Condensed Matter Physics, Lviv, Ukraine.
J Comput Chem. 2021 Apr 30;42(11):746-760. doi: 10.1002/jcc.26494. Epub 2021 Feb 14.
Efficient design and screening of the novel molecules is a major challenge in drug and material design. This paper focuses on a multi-stage pipeline, in which several deep neural network models are combined to map discrete molecular representations into continuous vector space to later generate from it new molecular structures with desired properties. Here, the Attention-based Sequence-to-Sequence model is added to "spellcheck" and correct generated structures, while the oversampling in the continuous space allows generating candidate structures with desired distribution for properties and molecular descriptors, even for a small reference datasets. We further use computer simulation to validate the desired properties in the numerical experiment. With the focus on the drug design, such a pipeline allows generating novel structures with a control of Synthetic Accessibility Score and a series of metrics that assess the drug-likeliness. Our code is available at https://github.com/SoftServeInc/novel-molecule-generation.
高效设计和筛选新型分子是药物和材料设计的主要挑战。本文重点介绍了一个多阶段的流水线,其中结合了几个深度神经网络模型,将离散的分子表示映射到连续的向量空间中,然后从中生成具有所需性质的新分子结构。在这里,基于注意力的序列到序列模型被添加到“拼写检查”中,以纠正生成的结构,而在连续空间中的过采样允许生成具有所需性质和分子描述符分布的候选结构,即使对于较小的参考数据集也是如此。我们进一步使用计算机模拟在数值实验中验证所需的性质。在药物设计的重点上,这样的流水线允许生成新型结构,控制合成可及性得分和一系列评估药物可能性的指标。我们的代码可在 https://github.com/SoftServeInc/novel-molecule-generation 获得。