Ruth Marcel, Gensch Tobias, Schreiner Peter R
Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392, Giessen, Germany.
Present address: Red Bull GmbH, Am Brunnen 1, 5330, Fuschl am See, Austria.
Angew Chem Int Ed Engl. 2024 Nov 25;63(48):e202410308. doi: 10.1002/anie.202410308. Epub 2024 Oct 15.
With the rise of machine learning (ML), the modeling of chemical systems has reached a new era and has the potential to revolutionize how we understand and predict chemical reactions. Here, we probe the historic dependence on utilizing enantiomeric excess (ee) as a target variable and discuss the benefits of using relative Gibbs free activation energies (ΔΔG), grounded firmly in transition-state theory, emphasizing practical benefits for chemists. This perspective is intended to discuss best practices that enhance modeling efforts especially for chemists with an experimental background in asymmetric catalysis that wish to explore modelling of their data. We outline the enhanced modeling performance using ΔΔG, escaping physical limitations, addressing temperature effects, managing non-linear error propagation, adjusting for data distributions and how to deal with unphysical predictions,in order to streamline modeling for the practical chemist and provide simple guidelines to strong statistical tools. For this endeavor, we gathered ten datasets from the literature covering very different reaction types. We evaluated the datasets using fingerprint-, descriptor-, and graph neural network-based models. Our results highlight the distinction in performance among varying model complexities with respect to the target representation, emphasizing practical benefits for chemists.
随着机器学习(ML)的兴起,化学系统建模已进入一个新时代,并有潜力彻底改变我们理解和预测化学反应的方式。在此,我们探究了历史上对利用对映体过量(ee)作为目标变量的依赖,并讨论了使用基于过渡态理论的相对吉布斯自由活化能(ΔΔG)的好处,强调了对化学家的实际益处。这一观点旨在讨论最佳实践,特别是对于那些具有不对称催化实验背景、希望探索其数据建模的化学家而言,如何加强建模工作。我们概述了使用ΔΔG增强建模性能的方法,包括突破物理限制、解决温度影响、管理非线性误差传播、调整数据分布以及如何处理非物理预测,以便为实际的化学家简化建模,并为强大的统计工具提供简单指南。为此,我们从文献中收集了十个涵盖非常不同反应类型的数据集。我们使用基于指纹、描述符和图神经网络的模型对这些数据集进行了评估。我们的结果突出了不同模型复杂度在目标表示方面的性能差异,强调了对化学家的实际益处。