• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于建模交互效应的机器学习方法:开发及其在乙醇脱氧氟化反应中的应用

A Machine Learning Approach to Model Interaction Effects: Development and Application to Alcohol Deoxyfluorination.

作者信息

Żurański Andrzej M, Gandhi Shivaani S, Doyle Abigail G

机构信息

Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States.

Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States.

出版信息

J Am Chem Soc. 2023 Apr 12;145(14):7898-7909. doi: 10.1021/jacs.2c13093. Epub 2023 Mar 29.

DOI:10.1021/jacs.2c13093
PMID:36988153
Abstract

The application of machine learning (ML) techniques to model high-throughput experimentation (HTE) datasets has seen a recent rise in popularity. Nevertheless, the ability to model the interplay between reaction components, known as interaction effects, with ML remains an outstanding challenge. Using a simulated HTE dataset, we find that the presence of irrelevant features poses a strong obstacle to learning interaction effects with common ML algorithms. To address this problem, we propose a two-part statistical modeling approach for HTE datasets: classical analysis of variance of the experiment to identify systematic effects that impact reaction yield across the experiment followed by regression of individual effects using chemistry-informed features. To illustrate this methodology, we use our previously published alcohol deoxyfluorination dataset comprising 740 reactions to build a compact, interpretable generalized additive model that accounts for each significant effect observed in the dataset. We achieve a sizeable performance boost compared to our previously published random forest model, reducing mean absolute error from 18 to 13% and root-mean-squared error from 22 to 17% on a newly generated validation set. Finally, we demonstrate that this approach can facilitate the generation of new mechanistic hypotheses, which, when probed experimentally, can lead to a deeper understanding of chemical reactivity.

摘要

机器学习(ML)技术在高通量实验(HTE)数据集建模中的应用近来越来越受欢迎。然而,利用ML对反应组分之间的相互作用(即交互效应)进行建模的能力仍然是一个突出的挑战。通过使用一个模拟的HTE数据集,我们发现无关特征的存在对使用常见ML算法学习交互效应构成了强大障碍。为了解决这个问题,我们针对HTE数据集提出了一种两部分的统计建模方法:对实验进行经典方差分析,以识别影响整个实验反应产率的系统效应,然后使用化学信息特征对个体效应进行回归分析。为了说明这种方法,我们使用我们之前发表的包含740个反应的醇脱氧氟化数据集,构建了一个紧凑、可解释的广义相加模型,该模型考虑了数据集中观察到的每个显著效应。与我们之前发表的随机森林模型相比,我们实现了相当大的性能提升,在新生成的验证集上,平均绝对误差从18%降至13%,均方根误差从22%降至17%。最后,我们证明这种方法可以促进新的机理假设的产生,通过实验探究这些假设可以加深对化学反应性的理解。

相似文献

1
A Machine Learning Approach to Model Interaction Effects: Development and Application to Alcohol Deoxyfluorination.一种用于建模交互效应的机器学习方法:开发及其在乙醇脱氧氟化反应中的应用
J Am Chem Soc. 2023 Apr 12;145(14):7898-7909. doi: 10.1021/jacs.2c13093. Epub 2023 Mar 29.
2
Predicting Reaction Yields via Supervised Learning.通过有监督学习预测反应产率。
Acc Chem Res. 2021 Apr 20;54(8):1856-1865. doi: 10.1021/acs.accounts.0c00770. Epub 2021 Mar 31.
3
Probing the chemical 'reactome' with high-throughput experimentation data.利用高通量实验数据探究化学“反应组”
Nat Chem. 2024 Apr;16(4):633-643. doi: 10.1038/s41557-023-01393-w. Epub 2024 Jan 2.
4
Comparison of machine learning clustering algorithms for detecting heterogeneity of treatment effect in acute respiratory distress syndrome: A secondary analysis of three randomised controlled trials.机器学习聚类算法在急性呼吸窘迫综合征治疗效果异质性检测中的比较:三项随机对照试验的二次分析。
EBioMedicine. 2021 Dec;74:103697. doi: 10.1016/j.ebiom.2021.103697. Epub 2021 Dec 1.
5
Heterogeneous treatment effect analysis based on machine-learning methodology.基于机器学习方法的异质处理效应分析。
CPT Pharmacometrics Syst Pharmacol. 2021 Nov;10(11):1433-1443. doi: 10.1002/psp4.12715. Epub 2021 Oct 30.
6
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
7
Ultrahigh-Throughput Experimentation for Information-Rich Chemical Synthesis.高通量实验在信息丰富的化学合成中的应用。
Acc Chem Res. 2021 May 18;54(10):2337-2346. doi: 10.1021/acs.accounts.1c00119. Epub 2021 Apr 23.
8
On the use of real-world datasets for reaction yield prediction.关于使用真实世界数据集进行反应产率预测
Chem Sci. 2023 Mar 13;14(19):4997-5005. doi: 10.1039/d2sc06041h. eCollection 2023 May 17.
9
The Evolution of Chemical High-Throughput Experimentation To Address Challenging Problems in Pharmaceutical Synthesis.化学高通量实验的发展,以解决制药合成中的挑战性问题。
Acc Chem Res. 2017 Dec 19;50(12):2976-2985. doi: 10.1021/acs.accounts.7b00428. Epub 2017 Nov 27.
10
Error Tolerance of Machine Learning Algorithms across Contemporary Biological Targets.机器学习算法在当代生物靶标中的容错性。
Molecules. 2019 Jun 4;24(11):2115. doi: 10.3390/molecules24112115.

引用本文的文献

1
Data Science-Guided Development of Deoxyfluorination Reagents with Enhanced Reactivity, Practicality, and Safety.数据科学指导下具有更高反应活性、实用性和安全性的脱氧氟化试剂的开发
J Am Chem Soc. 2025 Jul 23;147(29):25815-25824. doi: 10.1021/jacs.5c07548. Epub 2025 Jul 9.
2
A strategy for the controllable generation of organic superbases from benchtop-stable salts.一种从台面稳定盐可控生成有机超强碱的策略。
Chem Sci. 2024 May 29;15(26):10018-10026. doi: 10.1039/d4sc02524e. eCollection 2024 Jul 3.
3
Deconvoluting Nonlinear Catalyst-Substrate Effects in the Intramolecular Dirhodium-Catalyzed C-H Insertion of Donor/Donor Carbenes Using Data Science Tools.
使用数据科学工具解析分子内二铑催化供体/供体卡宾的C-H插入反应中的非线性催化剂-底物效应
ACS Catal. 2023 Dec 11;14(1):104-115. doi: 10.1021/acscatal.3c04256. eCollection 2024 Jan 5.
4
Acute ischemic stroke prediction and predictive factors analysis using hematological indicators in elderly hypertensives post-transient ischemic attack.利用老年高血压患者短暂性脑缺血发作后血液学指标预测急性缺血性脑卒中及分析其预测因素。
Sci Rep. 2024 Jan 6;14(1):695. doi: 10.1038/s41598-024-51402-2.
5
Data-Driven Predetermination of Cu Oxidation State in Copper Nanoparticles: Application to the Synthesis by Laser Ablation in Liquid.基于数据驱动的铜纳米颗粒中铜氧化态的预先确定:在液体中激光烧蚀合成中的应用。
J Am Chem Soc. 2023 Nov 29;145(47):25737-25752. doi: 10.1021/jacs.3c09158. Epub 2023 Oct 31.