Suppr超能文献

一种用于建模交互效应的机器学习方法:开发及其在乙醇脱氧氟化反应中的应用

A Machine Learning Approach to Model Interaction Effects: Development and Application to Alcohol Deoxyfluorination.

作者信息

Żurański Andrzej M, Gandhi Shivaani S, Doyle Abigail G

机构信息

Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States.

Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States.

出版信息

J Am Chem Soc. 2023 Apr 12;145(14):7898-7909. doi: 10.1021/jacs.2c13093. Epub 2023 Mar 29.

Abstract

The application of machine learning (ML) techniques to model high-throughput experimentation (HTE) datasets has seen a recent rise in popularity. Nevertheless, the ability to model the interplay between reaction components, known as interaction effects, with ML remains an outstanding challenge. Using a simulated HTE dataset, we find that the presence of irrelevant features poses a strong obstacle to learning interaction effects with common ML algorithms. To address this problem, we propose a two-part statistical modeling approach for HTE datasets: classical analysis of variance of the experiment to identify systematic effects that impact reaction yield across the experiment followed by regression of individual effects using chemistry-informed features. To illustrate this methodology, we use our previously published alcohol deoxyfluorination dataset comprising 740 reactions to build a compact, interpretable generalized additive model that accounts for each significant effect observed in the dataset. We achieve a sizeable performance boost compared to our previously published random forest model, reducing mean absolute error from 18 to 13% and root-mean-squared error from 22 to 17% on a newly generated validation set. Finally, we demonstrate that this approach can facilitate the generation of new mechanistic hypotheses, which, when probed experimentally, can lead to a deeper understanding of chemical reactivity.

摘要

机器学习(ML)技术在高通量实验(HTE)数据集建模中的应用近来越来越受欢迎。然而,利用ML对反应组分之间的相互作用(即交互效应)进行建模的能力仍然是一个突出的挑战。通过使用一个模拟的HTE数据集,我们发现无关特征的存在对使用常见ML算法学习交互效应构成了强大障碍。为了解决这个问题,我们针对HTE数据集提出了一种两部分的统计建模方法:对实验进行经典方差分析,以识别影响整个实验反应产率的系统效应,然后使用化学信息特征对个体效应进行回归分析。为了说明这种方法,我们使用我们之前发表的包含740个反应的醇脱氧氟化数据集,构建了一个紧凑、可解释的广义相加模型,该模型考虑了数据集中观察到的每个显著效应。与我们之前发表的随机森林模型相比,我们实现了相当大的性能提升,在新生成的验证集上,平均绝对误差从18%降至13%,均方根误差从22%降至17%。最后,我们证明这种方法可以促进新的机理假设的产生,通过实验探究这些假设可以加深对化学反应性的理解。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验