Tysinger Emma P, Rai Brajesh K, Sinitskiy Anton V
Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States.
J Chem Inf Model. 2023 Mar 27;63(6):1734-1744. doi: 10.1021/acs.jcim.2c01618. Epub 2023 Mar 13.
Meaningful exploration of the chemical space of druglike molecules in drug design is a highly challenging task due to a combinatorial explosion of possible modifications of molecules. In this work, we address this problem with transformer models, a type of machine learning (ML) model originally developed for machine translation. By training transformer models on pairs of similar bioactive molecules from the public ChEMBL data set, we enable them to learn medicinal-chemistry-meaningful, context-dependent transformations of molecules, including those absent from the training set. By retrospective analysis on the performance of transformer models on ChEMBL subsets of ligands binding to COX2, DRD2, or HERG protein targets, we demonstrate that the models can generate structures identical or highly similar to most active ligands, despite the models having not seen any ligands active against the corresponding protein target during training. Our work demonstrates that human experts working on hit expansion in drug design can easily and quickly employ transformer models, originally developed to translate texts from one natural language to another, to "translate" from known molecules active against a given protein target to novel molecules active against the same target.
在药物设计中,由于药物类分子可能的修饰会出现组合爆炸,因此对其化学空间进行有意义的探索是一项极具挑战性的任务。在这项工作中,我们使用变压器模型来解决这个问题,变压器模型是一种最初为机器翻译而开发的机器学习(ML)模型。通过在来自公共ChEMBL数据集的成对相似生物活性分子上训练变压器模型,我们使它们能够学习有药物化学意义的、上下文相关的分子转化,包括训练集中不存在的转化。通过对变压器模型在与COX2、DRD2或HERG蛋白靶点结合的配体的ChEMBL子集上的性能进行回顾性分析,我们证明,尽管模型在训练期间没有见过任何对相应蛋白靶点有活性的配体,但它们仍能生成与大多数活性配体相同或高度相似的结构。我们的工作表明,从事药物设计中命中扩展的人类专家可以轻松快速地使用最初开发用于将文本从一种自然语言翻译成另一种自然语言的变压器模型,将针对给定蛋白靶点有活性的已知分子“翻译”成针对同一靶点有活性的新分子。