通过主动迁移学习从有限数据预测反应条件。

Predicting reaction conditions from limited data through active transfer learning.

作者信息

Shim Eunjae, Kammeraad Joshua A, Xu Ziping, Tewari Ambuj, Cernak Tim, Zimmerman Paul M

机构信息

Department of Chemistry, University of Michigan Ann Arbor MI USA

Department of Statistics, University of Michigan Ann Arbor MI USA.

出版信息

Chem Sci. 2022 May 11;13(22):6655-6668. doi: 10.1039/d1sc06932b. eCollection 2022 Jun 7.

DOI:10.1039/d1sc06932b

PMID:35756521

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9172577/

Abstract

Transfer and active learning have the potential to accelerate the development of new chemical reactions, using prior data and new experiments to inform models that adapt to the target area of interest. This article shows how specifically tuned machine learning models, based on random forest classifiers, can expand the applicability of Pd-catalyzed cross-coupling reactions to types of nucleophiles unknown to the model. First, model transfer is shown to be effective when reaction mechanisms and substrates are closely related, even when models are trained on relatively small numbers of data points. Then, a model simplification scheme is tested and found to provide comparative predictivity on reactions of new nucleophiles that include unseen reagent combinations. Lastly, for a challenging target where model transfer only provides a modest benefit over random selection, an active transfer learning strategy is introduced to improve model predictions. Simple models, composed of a small number of decision trees with limited depths, are crucial for securing generalizability, interpretability, and performance of active transfer learning.

摘要

迁移学习和主动学习有潜力加速新化学反应的开发，利用先前的数据和新的实验来为适应目标感兴趣领域的模型提供信息。本文展示了基于随机森林分类器的经过专门调整的机器学习模型如何能够将钯催化交叉偶联反应的适用性扩展到该模型未知的亲核试剂类型。首先，当反应机理和底物密切相关时，即使模型是在相对较少的数据点上进行训练的，模型迁移也被证明是有效的。然后，测试了一种模型简化方案，发现该方案对包括未见试剂组合的新亲核试剂反应具有相当的预测能力。最后，对于一个具有挑战性的目标，即模型迁移仅比随机选择略有优势的情况，引入了一种主动迁移学习策略来改进模型预测。由少量深度有限的决策树组成的简单模型对于确保主动迁移学习的通用性、可解释性和性能至关重要。

相似文献

Predicting reaction conditions from limited data through active transfer learning.

Chem Sci. 2022 May 11;13(22):6655-6668. doi: 10.1039/d1sc06932b. eCollection 2022 Jun 7.

Autoencoder and restricted Boltzmann machine for transfer learning in functional magnetic resonance imaging task classification.

Heliyon. 2023 Jul 16;9(7):e18086. doi: 10.1016/j.heliyon.2023.e18086. eCollection 2023 Jul.

Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges.

Acc Chem Res. 2023 Feb 7;56(3):402-412. doi: 10.1021/acs.accounts.2c00801. Epub 2023 Jan 30.

Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.

Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.

Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.

J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

On the interpretability of machine learning-based model for predicting hypertension.

BMC Med Inform Decis Mak. 2019 Jul 29;19(1):146. doi: 10.1186/s12911-019-0874-0.

Predicting surgical decision-making in vestibular schwannoma using tree-based machine learning.

Neurosurg Focus. 2022 Apr;52(4):E8. doi: 10.3171/2022.1.FOCUS21708.

Using transfer learning and dimensionality reduction techniques to improve generalisability of machine-learning predictions of mosquito ages from mid-infrared spectra.

BMC Bioinformatics. 2023 Jan 9;24(1):11. doi: 10.1186/s12859-022-05128-5.

Predicting Reaction Yields via Supervised Learning.

Acc Chem Res. 2021 Apr 20;54(8):1856-1865. doi: 10.1021/acs.accounts.0c00770. Epub 2021 Mar 31.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

引用本文的文献

BatGPT-Chem: A Foundation Large Model for Chemical Engineering.

Research (Wash D C). 2025 Sep 10;8:0827. doi: 10.34133/research.0827. eCollection 2025.

Predicting reaction conditions: a data-driven perspective.

Chem Sci. 2025 Aug 6. doi: 10.1039/d5sc03045e.

Application of the digital annealer unit in optimizing chemical reaction conditions for enhanced production yields.

J Cheminform. 2025 Jul 14;17(1):105. doi: 10.1186/s13321-025-01043-y.

Local reaction condition optimization via machine learning.

J Mol Model. 2025 Apr 23;31(5):143. doi: 10.1007/s00894-025-06365-0.

Transfer learning across different photocatalytic organic reactions.

Nat Commun. 2025 Apr 10;16(1):3388. doi: 10.1038/s41467-025-58687-5.

Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates.

J Am Chem Soc. 2025 Mar 5;147(9):7476-7484. doi: 10.1021/jacs.4c15902. Epub 2025 Feb 21.

An active representation learning method for reaction yield prediction with small-scale data.

Commun Chem. 2025 Feb 10;8(1):42. doi: 10.1038/s42004-025-01434-0.

Molecular optimization using a conditional transformer for reaction-aware compound exploration with reinforcement learning.

Commun Chem. 2025 Feb 8;8(1):40. doi: 10.1038/s42004-025-01437-x.

Recommending reaction conditions with label ranking.

Chem Sci. 2025 Feb 3;16(9):4109-4118. doi: 10.1039/d4sc06728b. eCollection 2025 Feb 26.

Data science-centric design, discovery, and evaluation of novel synthetically accessible polyimides with desired dielectric constants.

Chem Sci. 2024 Oct 4;15(43):18099-110. doi: 10.1039/d4sc05000b.

本文引用的文献

On the use of real-world datasets for reaction yield prediction.

Chem Sci. 2023 Mar 13;14(19):4997-5005. doi: 10.1039/d2sc06041h. eCollection 2023 May 17.

Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction.

Mol Inform. 2022 Dec;41(12):e2200043. doi: 10.1002/minf.202200043. Epub 2022 Jul 14.

Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package.

J Chem Phys. 2021 Aug 28;155(8):084801. doi: 10.1063/5.0055522.

Predicting enzymatic reactions with a molecular transformer.

Chem Sci. 2021 May 25;12(25):8648-8659. doi: 10.1039/d1sc02362d. eCollection 2021 Jul 1.

Accelerating high-throughput virtual screening through molecular pool-based active learning.

Chem Sci. 2021 Apr 29;12(22):7866-7881. doi: 10.1039/d0sc06805e.

Predicting glycosylation stereoselectivity using machine learning.

Chem Sci. 2020 Dec 26;12(8):2931-2939. doi: 10.1039/d0sc06222g.

Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors.

Chem Sci. 2020 Dec 22;12(6):2198-2208. doi: 10.1039/d0sc04823b.

Ultrahigh-Throughput Experimentation for Information-Rich Chemical Synthesis.

Acc Chem Res. 2021 May 18;54(10):2337-2346. doi: 10.1021/acs.accounts.1c00119. Epub 2021 Apr 23.

Bayesian reaction optimization as a tool for chemical synthesis.

Nature. 2021 Feb;590(7844):89-96. doi: 10.1038/s41586-021-03213-y. Epub 2021 Feb 3.

Multilabel Classification Models for the Prediction of Cross-Coupling Reaction Conditions.

J Chem Inf Model. 2021 Jan 25;61(1):156-166. doi: 10.1021/acs.jcim.0c01234. Epub 2021 Jan 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过主动迁移学习从有限数据预测反应条件。

Predicting reaction conditions from limited data through active transfer learning.

作者信息

Shim Eunjae, Kammeraad Joshua A, Xu Ziping, Tewari Ambuj, Cernak Tim, Zimmerman Paul M

机构信息

Department of Chemistry, University of Michigan Ann Arbor MI USA

Department of Statistics, University of Michigan Ann Arbor MI USA.

出版信息

Chem Sci. 2022 May 11;13(22):6655-6668. doi: 10.1039/d1sc06932b. eCollection 2022 Jun 7.

DOI:10.1039/d1sc06932b

PMID:35756521

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9172577/

Abstract

摘要

通过主动迁移学习从有限数据预测反应条件。

Predicting reaction conditions from limited data through active transfer learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过主动迁移学习从有限数据预测反应条件。

Predicting reaction conditions from limited data through active transfer learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献