Beck Armen G, Iyer Sanjay, Fine Jonathan, Chopra Gaurav
Department of Chemistry, Purdue University 720 Clinic Drive West Lafayette IN 47907 USA
Purdue Institute for Drug Discovery West Lafayette IN 47907 USA.
Digit Discov. 2025 Mar 26;4(5):1352-1371. doi: 10.1039/d4dd00226a. eCollection 2025 May 14.
Optimization of chemical systems and processes have been enhanced and enabled by the development of new algorithms and analytical approaches. While several methods systematically investigate how underlying variables correlate with a given outcome, there is often a substantial number of experiments needed to accurately model such relationships. As chemical systems increase in complexity, algorithms are needed to propose experiments that efficiently optimize the underlying objective, while effectively sampling parameter space to avoid convergence on local minima. We have developed the Paddy software package based on the Paddy field algorithm, a biologically inspired evolutionary optimization algorithm that propagates parameters without direct inference of the underlying objective function. We benchmarked Paddy against several optimization approaches: the Tree of Parzen Estimator through the Hyperopt software library, Bayesian optimization with a Gaussian process Meta's Ax framework, and two population-based methods from EvoTorch-an evolutionary algorithm with Gaussian mutation, and a genetic algorithm using both a Gaussian mutation and single-point crossover-all representing diverse approaches to optimization. Paddy's performance is benchmarked for mathematical and chemical optimization tasks including global optimization of a two-dimensional bimodal distribution, interpolation of an irregular sinusoidal function, hyperparameter optimization of an artificial neural network tasked with classification of solvent for reaction components, targeted molecule generation by optimizing input vectors for a decoder network, and sampling discrete experimental space for optimal experimental planning. Paddy demonstrates robust versatility by maintaining strong performance across all optimization benchmarks, compared to other algorithms with varying performance. Additionally, Paddy avoids early convergence with its ability to bypass local optima in search of global solutions. We anticipate that the facile, versatile, robust and open-source nature of Paddy will serve as a toolkit in chemical problem-solving tasks towards automated experimentation with high priority for exploratory sampling and innate resistance to early convergence to identify optimal solutions.
新算法和分析方法的发展增强并推动了化学系统和过程的优化。虽然有几种方法系统地研究了潜在变量与给定结果之间的相关性,但通常需要大量实验才能准确地对这种关系进行建模。随着化学系统复杂性的增加,需要算法来提出能够有效优化潜在目标的实验,同时有效地对参数空间进行采样,以避免收敛于局部最小值。我们基于稻田算法开发了Paddy软件包,这是一种受生物启发的进化优化算法,可以在不直接推断潜在目标函数的情况下传播参数。我们将Paddy与几种优化方法进行了基准测试:通过Hyperopt软件库实现的帕曾估计器树、使用高斯过程的贝叶斯优化、Meta的Ax框架,以及来自EvoTorch的两种基于种群的方法——一种具有高斯变异的进化算法和一种同时使用高斯变异和单点交叉的遗传算法——所有这些都代表了不同的优化方法。Paddy的性能在数学和化学优化任务中进行了基准测试,包括二维双峰分布的全局优化、不规则正弦函数的插值、用于反应组分溶剂分类的人工神经网络的超参数优化、通过优化解码器网络的输入向量生成目标分子,以及为优化实验规划对离散实验空间进行采样。与其他性能各异的算法相比,Paddy在所有优化基准测试中都保持了强劲的性能,展现出强大的通用性。此外,Paddy能够绕过局部最优解以寻找全局解,从而避免了早期收敛。我们预计,Paddy的简便性、通用性、稳健性和开源性质将成为化学问题解决任务中的一个工具包,用于自动化实验,高度优先考虑探索性采样,并具有先天的抗早期收敛能力以识别最优解。