利用主动学习开发用于反应产率预测的机器学习模型。

Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction.

作者信息

Viet Johansson Simon, Gummesson Svensson Hampus, Bjerrum Esben, Schliep Alexander, Haghir Chehreghani Morteza, Tyrchan Christian, Engkvist Ola

机构信息

Molecular AI, Discovery Sciences, R&D, AstraZeneca, SE-431 83, Mölndal, Sweden.

Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, SE-412 96, Göteborg, Sweden.

出版信息

Mol Inform. 2022 Dec;41(12):e2200043. doi: 10.1002/minf.202200043. Epub 2022 Jul 14.

DOI:10.1002/minf.202200043

PMID:35732584

Abstract

Computer aided synthesis planning, suggesting synthetic routes for molecules of interest, is a rapidly growing field. The machine learning methods used are often dependent on access to large datasets for training, but finite experimental budgets limit how much data can be obtained from experiments. This suggests the use of schemes for data collection such as active learning, which identifies the data points of highest impact for model accuracy, and which has been used in recent studies with success. However, little has been done to explore the robustness of the methods predicting reaction yield when used together with active learning to reduce the amount of experimental data needed for training. This study aims to investigate the influence of machine learning algorithms and the number of initial data points on reaction yield prediction for two public high-throughput experimentation datasets. Our results show that active learning based on output margin reached a pre-defined AUROC faster than random sampling on both datasets. Analysis of feature importance of the trained machine learning models suggests active learning had a larger influence on the model accuracy when only a few features were important for the model prediction.

摘要

计算机辅助合成规划，即针对感兴趣的分子提出合成路线，是一个快速发展的领域。所使用的机器学习方法通常依赖于获取大量数据集进行训练，但有限的实验预算限制了从实验中可获得的数据量。这表明应使用诸如主动学习之类的数据收集方案，主动学习可识别对模型准确性影响最大的数据点，并且已在近期研究中成功应用。然而，对于将预测反应产率的方法与主动学习结合使用以减少训练所需的实验数据量时这些方法的稳健性，几乎没有进行过探索。本研究旨在调查机器学习算法和初始数据点数量对两个公共高通量实验数据集的反应产率预测的影响。我们的结果表明，基于输出裕度的主动学习比两个数据集上的随机抽样更快地达到预定义的曲线下面积（AUROC）。对训练后的机器学习模型的特征重要性分析表明，当只有少数特征对模型预测很重要时，主动学习对模型准确性有更大影响。

相似文献

Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction.利用主动学习开发用于反应产率预测的机器学习模型。

Mol Inform. 2022 Dec;41(12):e2200043. doi: 10.1002/minf.202200043. Epub 2022 Jul 14.

Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在（放化疗）治疗结果预测中的应用：分类器的实证比较。

Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.

Error Tolerance of Machine Learning Algorithms across Contemporary Biological Targets.机器学习算法在当代生物靶标中的容错性。

Molecules. 2019 Jun 4;24(11):2115. doi: 10.3390/molecules24112115.

Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset.使用来自国家数据集的临床数据，对多种样本量的合成数据集进行训练的机器学习模型，用于预测血压。

PLoS One. 2023 Mar 16;18(3):e0283094. doi: 10.1371/journal.pone.0283094. eCollection 2023.

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合，以预测放射性肺损伤。

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

Machine Learning in Computer-Aided Synthesis Planning.计算机辅助合成规划中的机器学习

Acc Chem Res. 2018 May 15;51(5):1281-1289. doi: 10.1021/acs.accounts.8b00087. Epub 2018 May 1.

Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective.从数据预处理和机器学习角度看糖尿病的预测与诊断

Comput Methods Programs Biomed. 2022 Jun;220:106773. doi: 10.1016/j.cmpb.2022.106773. Epub 2022 Mar 31.

Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病：模型开发与性能评估

JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.

Testing the applicability and performance of Auto ML for potential applications in diagnostic neuroradiology.测试 Auto ML 在诊断神经放射学中的潜在应用的适用性和性能。

Sci Rep. 2022 Aug 11;12(1):13648. doi: 10.1038/s41598-022-18028-8.

Can Machine-learning Algorithms Predict Early Revision TKA in the Danish Knee Arthroplasty Registry?机器学习算法能否预测丹麦膝关节置换登记处的早期翻修 TKA？

Clin Orthop Relat Res. 2020 Sep;478(9):2088-2101. doi: 10.1097/CORR.0000000000001343.

引用本文的文献

Advancing genetic engineering with active learning: theory, implementations and potential opportunities.通过主动学习推进基因工程：理论、实现与潜在机遇

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf286.

Automation and machine learning augmented by large language models in a catalysis study.在一项催化研究中，由大语言模型增强的自动化和机器学习。

Chem Sci. 2024 Jun 26;15(31):12200-12233. doi: 10.1039/d3sc07012c. eCollection 2024 Aug 7.

Deep Kernel learning for reaction outcome prediction and optimization.用于反应结果预测与优化的深度核学习

Commun Chem. 2024 Jun 14;7(1):136. doi: 10.1038/s42004-024-01219-x.

When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges.当产量预测无法预测时：当前挑战概述。

J Chem Inf Model. 2024 Jan 8;64(1):42-56. doi: 10.1021/acs.jcim.3c01524. Epub 2023 Dec 20.

Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery.预测化学：用于反应部署、反应开发和反应发现的机器学习

Chem Sci. 2022 Nov 28;14(2):226-244. doi: 10.1039/d2sc05089g. eCollection 2023 Jan 4.

Predicting reaction conditions from limited data through active transfer learning.通过主动迁移学习从有限数据预测反应条件。

Chem Sci. 2022 May 11;13(22):6655-6668. doi: 10.1039/d1sc06932b. eCollection 2022 Jun 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用主动学习开发用于反应产率预测的机器学习模型。

Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献