结合基于云的自由能计算、综合感知枚举和目标导向的生成式机器学习，实现快速大规模化学探索和优化。

Combining Cloud-Based Free-Energy Calculations, Synthetically Aware Enumerations, and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization.

机构信息

Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States.

出版信息

J Chem Inf Model. 2020 Sep 28;60(9):4311-4325. doi: 10.1021/acs.jcim.0c00120. Epub 2020 Jun 19.

DOI:10.1021/acs.jcim.0c00120

Abstract

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high-throughput screens (HTS) or computational virtual high-throughput screens (vHTS). We have previously demonstrated that, by coupling reaction-based enumeration, active learning, and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based free energy perturbation (FEP) profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of predefined drug-like property space. We can achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR-based multiparameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can (1) provide a 6.4-fold enrichment improvement in identifying <10 nM compounds over random selection and a 1.5-fold enrichment in identifying <10 nM compounds over our previous method, (2) rapidly explore relevant chemical space outside the bounds of commercial reagents, (3) use generative ML approaches to "learn" the SAR from large scale in silico enumerations and generate novel idea molecules for a flexible receptor site that are both potent and within relevant physicochemical space, and (4) produce over 3 000 000 idea molecules and run 1935 FEP simulations, identifying 69 ideas with a predicted IC < 10 nM and 358 ideas with a predicted IC < 100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches and has the potential to rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.

摘要

命中鉴定过程通常涉及通过传统的实验高通量筛选（HTS）或计算虚拟高通量筛选（vHTS）对数百万种甚至最近数十亿种化合物进行分析。我们之前已经证明，通过结合基于反应的枚举、主动学习和自由能计算，可以将同样大规模的化学空间探索扩展到从命中到先导的过程。在这项工作中，我们通过将大规模枚举和基于云的自由能扰动（FEP）分析与目标导向的生成式机器学习相结合来扩展该方法，与仅进行大规模枚举相比，这可以更高地富集有效化合物，同时保持在预定义的类药性空间范围内。我们可以通过从 PathFinder 基于规则的枚举中构建生成式机器学习的分子分布并对基于权重的 QSAR 多参数优化函数进行优化来实现这一点。我们通过设计细胞周期蛋白依赖性激酶 2（CDK2）的有效抑制剂来检验这种组合方法的实用性，并展示了一种耦合工作流程，该流程可以：（1）提供 6.4 倍的富集改进，从而能够从随机选择中识别出<10 nM 的化合物，从我们之前的方法中识别出<10 nM 的化合物的富集度提高了 1.5 倍；（2）快速探索商业试剂范围之外的相关化学空间；（3）使用生成式机器学习方法从大规模的计算枚举中“学习”SAR，并生成针对灵活受体的新型有效化合物分子，这些分子在具有相关物理化学空间的同时也具有活性；（4）生成超过 300 万个构思分子并运行 1935 次 FEP 模拟，识别出 69 个预测 IC <10 nM 的点子，358 个预测 IC <100 nM 的点子。报告的数据表明，结合基于反应的和生成式机器学习进行构思，可以提高有效化合物的富集度，超过之前描述的方法，并且有可能在预定义的效力和性质空间内快速加速新型化学物质的发现。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

结合基于云的自由能计算、综合感知枚举和目标导向的生成式机器学习，实现快速大规模化学探索和优化。

Combining Cloud-Based Free-Energy Calculations, Synthetically Aware Enumerations, and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization.

机构信息

出版信息

相似文献

引用本文的文献

结合基于云的自由能计算、综合感知枚举和目标导向的生成式机器学习，实现快速大规模化学探索和优化。

Combining Cloud-Based Free-Energy Calculations, Synthetically Aware Enumerations, and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization.

机构信息

出版信息

相似文献

引用本文的文献