Amar Yehia, Schweidtmann Artur M, Deutsch Paul, Cao Liwei, Lapkin Alexei
Department of Chemical Engineering and Biotechnology , University of Cambridge , Philippa Fawcett Drive , Cambridge , CB3 0AS , UK . Email:
Aachener Verfahrenstechnik - Process Systems Engineering , RWTH Aachen University , Aachen , Germany.
Chem Sci. 2019 May 30;10(27):6697-6706. doi: 10.1039/c9sc01844a. eCollection 2019 Jul 21.
Rational solvent selection remains a significant challenge in process development. Here we describe a hybrid mechanistic-machine learning approach, geared towards automated process development workflow. A library of 459 solvents was used, for which 12 conventional molecular descriptors, two reaction-specific descriptors, and additional descriptors based on screening charge density, were calculated. Gaussian process surrogate models were trained on experimental data from a Rh(CO)(acac)/Josiphos catalysed asymmetric hydrogenation of a chiral α-β unsaturated γ-lactam. With two simultaneous objectives - high conversion and high diastereomeric excess - the multi-objective algorithm, trained on the initial dataset of 25 solvents, has identified solvents leading to better reaction outcomes. In addition to being a powerful design of experiments (DoE) methodology, the resulting Gaussian process surrogate model for conversion is, in statistical terms, predictive, with a cross-validation correlation coefficient of 0.84. After identifying promising solvents, the composition of solvent mixtures and optimal reaction temperature were found using a black-box Bayesian optimisation. We then demonstrated the application of a new genetic programming approach to select an appropriate machine learning model for a specific physical system, which should allow the transition of the overall process development workflow into the future robotic laboratories.
在工艺开发中,合理选择溶剂仍然是一项重大挑战。在此,我们描述了一种混合的机理-机器学习方法,旨在实现自动化工艺开发工作流程。我们使用了一个包含459种溶剂的库,并计算了12种常规分子描述符、两种反应特定描述符以及基于筛选电荷密度的其他描述符。高斯过程代理模型是根据Rh(CO)(acac)/Josiphos催化的手性α-β不饱和γ-内酰胺不对称氢化反应的实验数据进行训练的。通过两个同时的目标——高转化率和高非对映体过量——在最初的25种溶剂数据集上训练的多目标算法,已经识别出能带来更好反应结果的溶剂。除了是一种强大的实验设计(DoE)方法外,由此产生的用于转化率的高斯过程代理模型在统计上具有预测性,交叉验证相关系数为0.84。在识别出有前景的溶剂后,使用黑箱贝叶斯优化方法确定了溶剂混合物的组成和最佳反应温度。然后,我们展示了一种新的遗传编程方法在为特定物理系统选择合适的机器学习模型方面的应用,这应该能够使整个工艺开发工作流程向未来的机器人实验室过渡。