Department of Pharmacy & Pharmaceutical Sciences, University of California, Irvine, California 92697, United States.
Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse, B-2340, Belgium.
J Chem Inf Model. 2023 Mar 27;63(6):1776-1793. doi: 10.1021/acs.jcim.2c01579. Epub 2023 Mar 6.
Drug discovery is accelerated with computational methods such as alchemical simulations to estimate ligand affinities. In particular, relative binding free energy (RBFE) simulations are beneficial for lead optimization. To use RBFE simulations to compare prospective ligands , researchers first plan the simulation experiment, using graphs where nodes represent ligands and graph edges represent alchemical transformations between ligands. Recent work demonstrated that optimizing the statistical architecture of these perturbation graphs improves the accuracy of the predicted changes in the free energy of ligand binding. Therefore, to improve the success rate of computational drug discovery, we present the open-source software package High Information Mapper (HiMap)─a new take on its predecessor, Lead Optimization Mapper (LOMAP). HiMap removes heuristics decisions from design selection and instead finds statistically optimal graphs over ligands clustered with machine learning. Beyond optimal design generation, we present theoretical insights for designing alchemical perturbation maps. Some of these results include that for number of nodes, the precision of perturbation maps is stable at ·ln() edges. This result indicates that even an "optimal" graph can result in unexpectedly high errors if a plan includes too few alchemical transformations for the given number of ligands and edges. And, as a study compares more ligands, the performance of even optimal graphs will deteriorate with linear scaling of the edge count. In this sense, ensuring an A- or D-optimal topology is not enough to produce robust errors. We additionally find that optimal designs will converge more rapidly than radial and LOMAP designs. Moreover, we derive bounds for how clustering reduces cost for designs with a constant expected relative error per cluster, invariant of the size of the design. These results inform how to best design perturbation maps for computational drug discovery and have broader implications for experimental design.
药物发现可以通过计算方法(如化学模拟)来加速,以估计配体亲和力。特别是,相对结合自由能(RBFE)模拟有助于先导化合物优化。为了使用 RBFE 模拟来比较潜在的配体,研究人员首先计划模拟实验,使用图来表示配体,其中节点代表配体,图的边表示配体之间的化学转变。最近的工作表明,优化这些扰动图的统计结构可以提高预测配体结合自由能变化的准确性。因此,为了提高计算药物发现的成功率,我们提出了开源软件包 High Information Mapper(HiMap),这是其前身 Lead Optimization Mapper(LOMAP)的新版本。HiMap 从设计选择中去除了启发式决策,而是通过机器学习对配体进行聚类,从而找到统计上最优的图。除了最优设计生成,我们还提出了设计化学扰动图的理论见解。其中一些结果包括,对于节点数量,扰动图的精度在 ·ln()个边处是稳定的。这一结果表明,即使是一个“最优”的图,如果一个计划包含的化学转变对于给定数量的配体和边来说太少,也可能导致出乎意料的高误差。而且,随着比较更多的配体,即使是最优图的性能也会随着边数的线性扩展而恶化。在这种意义上,确保 A-或 D-最优拓扑结构不足以产生稳健的误差。我们还发现,最优设计的收敛速度将比径向和 LOMAP 设计更快。此外,我们推导出了在每个聚类的相对误差保持不变的情况下,聚类如何减少设计成本的界。这些结果为计算药物发现中如何最好地设计扰动图提供了信息,并对实验设计具有更广泛的意义。