College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China.
School of Computer Science, Wuhan University, Wuhan, Hubei 430072, P. R. China.
J Chem Inf Model. 2023 Jun 12;63(11):3319-3327. doi: 10.1021/acs.jcim.3c00579. Epub 2023 May 15.
In the past few years, a number of machine learning (ML)-based molecular generative models have been proposed for generating molecules with desirable properties, but they all require a large amount of label data of pharmacological and physicochemical properties. However, experimental determination of these labels, especially bioactivity labels, is very expensive. In this study, we analyze the dependence of various multi-property molecule generation models on biological activity label data and propose Frag-G/M, a fragment-based multi-constraint molecular generation framework based on conditional transformer, recurrent neural networks (RNNs), and reinforcement learning (RL). The experimental results illustrate that, using the same number of labels, Frag-G/M can generate more desired molecules than the baselines (several times more than the baselines). Moreover, compared with the known active compounds, the molecules generated by Frag-G/M exhibit higher scaffold diversity than those generated by the baselines, thus making it more promising to be used in real-world drug discovery scenarios.
在过去的几年中,已经提出了许多基于机器学习 (ML) 的分子生成模型,用于生成具有理想性质的分子,但它们都需要大量药理学和物理化学性质的标签数据。然而,这些标签的实验测定,特别是生物活性标签的测定非常昂贵。在这项研究中,我们分析了各种多属性分子生成模型对生物活性标签数据的依赖性,并提出了 Frag-G/M,这是一种基于条件转换器、递归神经网络 (RNN) 和强化学习 (RL) 的基于片段的多约束分子生成框架。实验结果表明,使用相同数量的标签,Frag-G/M 可以生成比基线更理想的分子(比基线多几倍)。此外,与已知的活性化合物相比,Frag-G/M 生成的分子具有更高的骨架多样性,因此在实际的药物发现场景中更有应用前景。