Shim Eunjae, Kammeraad Joshua A, Xu Ziping, Tewari Ambuj, Cernak Tim, Zimmerman Paul M
Department of Chemistry, University of Michigan Ann Arbor MI USA
Department of Statistics, University of Michigan Ann Arbor MI USA.
Chem Sci. 2022 May 11;13(22):6655-6668. doi: 10.1039/d1sc06932b. eCollection 2022 Jun 7.
Transfer and active learning have the potential to accelerate the development of new chemical reactions, using prior data and new experiments to inform models that adapt to the target area of interest. This article shows how specifically tuned machine learning models, based on random forest classifiers, can expand the applicability of Pd-catalyzed cross-coupling reactions to types of nucleophiles unknown to the model. First, model transfer is shown to be effective when reaction mechanisms and substrates are closely related, even when models are trained on relatively small numbers of data points. Then, a model simplification scheme is tested and found to provide comparative predictivity on reactions of new nucleophiles that include unseen reagent combinations. Lastly, for a challenging target where model transfer only provides a modest benefit over random selection, an active transfer learning strategy is introduced to improve model predictions. Simple models, composed of a small number of decision trees with limited depths, are crucial for securing generalizability, interpretability, and performance of active transfer learning.
迁移学习和主动学习有潜力加速新化学反应的开发,利用先前的数据和新的实验来为适应目标感兴趣领域的模型提供信息。本文展示了基于随机森林分类器的经过专门调整的机器学习模型如何能够将钯催化交叉偶联反应的适用性扩展到该模型未知的亲核试剂类型。首先,当反应机理和底物密切相关时,即使模型是在相对较少的数据点上进行训练的,模型迁移也被证明是有效的。然后,测试了一种模型简化方案,发现该方案对包括未见试剂组合的新亲核试剂反应具有相当的预测能力。最后,对于一个具有挑战性的目标,即模型迁移仅比随机选择略有优势的情况,引入了一种主动迁移学习策略来改进模型预测。由少量深度有限的决策树组成的简单模型对于确保主动迁移学习的通用性、可解释性和性能至关重要。