Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, U.K.
Exscientia Ltd, 36 St. Giles', Oxford OX1 3LD, U.K.
J Chem Inf Model. 2020 Apr 27;60(4):1983-1995. doi: 10.1021/acs.jcim.9b01120. Epub 2020 Apr 2.
Rational compound design remains a challenging problem for both computational methods and medicinal chemists. Computational generative methods have begun to show promising results for the design problem. However, they have not yet used the power of three-dimensional (3D) structural information. We have developed a novel graph-based deep generative model that combines state-of-the-art machine learning techniques with structural knowledge. Our method ("DeLinker") takes two fragments or partial structures and designs a molecule incorporating both. The generation process is protein-context-dependent, utilizing the relative distance and orientation between the partial structures. This 3D information is vital to successful compound design, and we demonstrate its impact on the generation process and the limitations of omitting such information. In a large-scale evaluation, DeLinker designed 60% more molecules with high 3D similarity to the original molecule than a database baseline. When considering the more relevant problem of longer linkers with at least five atoms, the outperformance increased to 200%. We demonstrate the effectiveness and applicability of this approach on a diverse range of design problems: fragment linking, scaffold hopping, and proteolysis targeting chimera (PROTAC) design. As far as we are aware, this is the first molecular generative model to incorporate 3D structural information directly in the design process. The code is available at https://github.com/oxpig/DeLinker.
理性的化合物设计仍然是计算方法和药物化学家面临的一个具有挑战性的问题。计算生成方法已经开始在设计问题上显示出有希望的结果。然而,它们还没有利用三维(3D)结构信息的力量。我们开发了一种新的基于图的深度生成模型,该模型将最先进的机器学习技术与结构知识相结合。我们的方法(“DeLinker”)采用两个片段或部分结构,并设计一个包含两者的分子。生成过程依赖于蛋白质上下文,利用部分结构之间的相对距离和方向。这种 3D 信息对成功的化合物设计至关重要,我们证明了它对生成过程的影响以及忽略这种信息的局限性。在大规模评估中,DeLinker 设计的分子与原始分子的 3D 相似度比数据库基线高 60%。当考虑到具有至少五个原子的更长接头的更相关问题时,性能提高到 200%。我们在各种设计问题上展示了这种方法的有效性和适用性:片段连接、支架跳跃和蛋白水解靶向嵌合体(PROTAC)设计。据我们所知,这是第一个在设计过程中直接纳入 3D 结构信息的分子生成模型。代码可在 https://github.com/oxpig/DeLinker 上获得。