Graduate School of System Informatics, Kobe University, Kobe 657-8501, Japan.
J Chem Inf Model. 2024 Oct 28;64(20):7873-7884. doi: 10.1021/acs.jcim.4c01214. Epub 2024 Oct 16.
Ultralarge virtual chemical spaces have emerged as a valuable resource for drug discovery, providing access to billions of make-on-demand compounds with high synthetic success rates. Chemical language models can potentially accelerate the exploration of these vast spaces through direct compound generation. However, existing models are not designed to navigate specific virtual chemical spaces and often overlook synthetic accessibility. To address this gap, we introduce product-of-experts (PoE) chemical language models, a modular and scalable approach to navigating ultralarge virtual chemical spaces. This method allows for controlled compound generation within a desired chemical space by combining a model pretrained on the target space with and models fine-tuned using external property-specific data sets. We demonstrate that the PoE chemical language model can generate compounds with desirable properties, such as those that favorably dock to dopamine receptor D2 (DRD2) and are predicted to cross the blood-brain barrier (BBB), while ensuring that the majority of generated compounds are present within the target chemical space. Our results highlight the potential of chemical language models for navigating ultralarge virtual chemical spaces, and we anticipate that this study will motivate further research in this direction. The source code and data are freely available at https://github.com/shuyana/poeclm.
超大虚拟化学空间已成为药物发现的宝贵资源,提供了数以亿计的按需合成化合物,具有高合成成功率。化学语言模型可以通过直接生成化合物来潜在地加速对这些广阔空间的探索。然而,现有的模型并非专门设计用于在特定虚拟化学空间中导航,并且经常忽略合成可及性。为了解决这一差距,我们引入了专家产品(PoE)化学语言模型,这是一种模块化和可扩展的方法,用于在超大虚拟化学空间中导航。该方法允许通过将在目标空间上预训练的模型与使用外部属性特定数据集微调的 和 模型相结合,在所需的化学空间内进行受控的化合物生成。我们证明了 PoE 化学语言模型可以生成具有理想性质的化合物,例如那些有利于与多巴胺受体 D2(DRD2)结合且预计可以穿过血脑屏障(BBB)的化合物,同时确保生成的大多数化合物都存在于目标化学空间内。我们的结果突出了化学语言模型在超大虚拟化学空间中导航的潜力,我们预计这项研究将激发该领域的进一步研究。源代码和数据可在 https://github.com/shuyana/poeclm 上免费获取。