一种用于建模交互效应的机器学习方法：开发及其在乙醇脱氧氟化反应中的应用

A Machine Learning Approach to Model Interaction Effects: Development and Application to Alcohol Deoxyfluorination.

作者信息

Żurański Andrzej M, Gandhi Shivaani S, Doyle Abigail G

机构信息

Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States.

Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States.

出版信息

J Am Chem Soc. 2023 Apr 12;145(14):7898-7909. doi: 10.1021/jacs.2c13093. Epub 2023 Mar 29.

DOI:10.1021/jacs.2c13093

PMID:36988153

Abstract

The application of machine learning (ML) techniques to model high-throughput experimentation (HTE) datasets has seen a recent rise in popularity. Nevertheless, the ability to model the interplay between reaction components, known as interaction effects, with ML remains an outstanding challenge. Using a simulated HTE dataset, we find that the presence of irrelevant features poses a strong obstacle to learning interaction effects with common ML algorithms. To address this problem, we propose a two-part statistical modeling approach for HTE datasets: classical analysis of variance of the experiment to identify systematic effects that impact reaction yield across the experiment followed by regression of individual effects using chemistry-informed features. To illustrate this methodology, we use our previously published alcohol deoxyfluorination dataset comprising 740 reactions to build a compact, interpretable generalized additive model that accounts for each significant effect observed in the dataset. We achieve a sizeable performance boost compared to our previously published random forest model, reducing mean absolute error from 18 to 13% and root-mean-squared error from 22 to 17% on a newly generated validation set. Finally, we demonstrate that this approach can facilitate the generation of new mechanistic hypotheses, which, when probed experimentally, can lead to a deeper understanding of chemical reactivity.

摘要

机器学习（ML）技术在高通量实验（HTE）数据集建模中的应用近来越来越受欢迎。然而，利用ML对反应组分之间的相互作用（即交互效应）进行建模的能力仍然是一个突出的挑战。通过使用一个模拟的HTE数据集，我们发现无关特征的存在对使用常见ML算法学习交互效应构成了强大障碍。为了解决这个问题，我们针对HTE数据集提出了一种两部分的统计建模方法：对实验进行经典方差分析，以识别影响整个实验反应产率的系统效应，然后使用化学信息特征对个体效应进行回归分析。为了说明这种方法，我们使用我们之前发表的包含740个反应的醇脱氧氟化数据集，构建了一个紧凑、可解释的广义相加模型，该模型考虑了数据集中观察到的每个显著效应。与我们之前发表的随机森林模型相比，我们实现了相当大的性能提升，在新生成的验证集上，平均绝对误差从18%降至13%，均方根误差从22%降至17%。最后，我们证明这种方法可以促进新的机理假设的产生，通过实验探究这些假设可以加深对化学反应性的理解。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于建模交互效应的机器学习方法：开发及其在乙醇脱氧氟化反应中的应用

A Machine Learning Approach to Model Interaction Effects: Development and Application to Alcohol Deoxyfluorination.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

一种用于建模交互效应的机器学习方法：开发及其在乙醇脱氧氟化反应中的应用

A Machine Learning Approach to Model Interaction Effects: Development and Application to Alcohol Deoxyfluorination.

作者信息

机构信息

出版信息

相似文献

引用本文的文献