在预测有机反应性、选择性和化学性质方面，工程化和学习的分子表示的重要性。

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties.

机构信息

Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States.

Biosciences Center, National Renewable Energy Laboratory, 15103 Denver West Parkway, Golden, Colorado 80401, United States.

出版信息

Acc Chem Res. 2021 Feb 16;54(4):827-836. doi: 10.1021/acs.accounts.0c00745. Epub 2021 Feb 3.

DOI:10.1021/acs.accounts.0c00745

PMID:33534534

Abstract

Machine-readable chemical structure representations are foundational in all attempts to harness machine learning for the prediction of reactivities, selectivities, and chemical properties directly from molecular structure. The featurization of discrete chemical structures into a continuous vector space is a critical phase undertaken before model selection, and the development of new ways to quantitatively encode molecules is an active area of research. In this Account, we highlight the application and suitability of different representations, from expert-guided "engineered" descriptors to automatically "learned" features, in different prediction tasks relevant to organic and organometallic chemistry, where differing amounts of training data are available. These tasks include statistical models of stereo- and enantioselectivity, thermochemistry, and kinetics developed using experimental and quantum chemical data.The use of expert-guided molecular descriptors provides an opportunity to incorporate chemical knowledge, domain expertise, and physical constraints into statistical modeling. In applications to stereoselective organic and organometallic catalysis, where data sets may be relatively small and 3D-geometries and conformations play an important role, mechanistically informed features can be used successfully to obtain predictive statistical models that are also chemically interpretable. We provide an overview of several recent applications of this approach to obtain quantitative models for reactivity and selectivity, where topological descriptors, quantum mechanical calculations of electronic and steric properties, along with conformational ensembles, all feature as essential ingredients of the molecular representations used.Alternatively, more flexible, general-purpose molecular representations such as attributed molecular graphs can be used with machine learning approaches to learn the complex relationship between a structure and prediction target. This approach has the potential to out-perform more traditional representation methods such as "hand-crafted" molecular descriptors, particularly as data set sizes grow. One area where this is particularly relevant is in the use of large sets of quantum mechanical data to train quantitative structure-property relationships. A general approach toward curating useful data sets and training highly accurate graph neural network models is discussed in the context of organic bond dissociation enthalpies, where this strategy outperforms regression using precomputed descriptors.Finally, we describe how graph neural network predictions can be incorporated into mechanistically informed statistical models of chemical reactivity and selectivity. Once trained, this approach avoids the expensive computational overhead associated with quantum mechanical calculations, while maintaining chemical interpretability. We illustrate examples for which fast predictions of bond dissociation enthalpy and of the identities of radicals formed through cleavage of a molecule's weakest bond are used in simple physical models of site-selectivity and reactivity.

摘要

机器可读的化学结构表示法是利用机器学习直接从分子结构预测反应性、选择性和化学性质的所有尝试的基础。将离散的化学结构特征化为连续的向量空间是在模型选择之前进行的关键阶段，开发新的定量编码分子的方法是一个活跃的研究领域。在本报告中，我们强调了不同表示方法的应用和适用性，从专家指导的“设计”描述符到自动“学习”的特征，这些方法在与有机和有机金属化学相关的不同预测任务中都有应用，这些任务涉及到不同数量的训练数据。这些任务包括使用实验和量子化学数据开发的立体和对映选择性、热化学和动力学的统计模型。使用专家指导的分子描述符提供了一个机会，可以将化学知识、领域专业知识和物理约束纳入统计建模中。在应用于立体选择性有机和有机金属催化的过程中，数据集可能相对较小，3D 几何形状和构象起着重要作用，因此可以成功使用机械启发式特征来获得可预测的统计模型，这些模型也具有化学可解释性。我们概述了这种方法的几个最新应用，以获得反应性和选择性的定量模型，其中拓扑描述符、电子和立体性质的量子力学计算以及构象系综都是所使用的分子表示的基本成分。或者，可以使用更灵活的通用分子表示形式，例如带属性的分子图，并结合机器学习方法来学习结构与预测目标之间的复杂关系。随着数据集规模的增长，这种方法有可能优于更传统的表示方法，例如“手工制作”的分子描述符。在使用大量量子力学数据来训练定量结构-性质关系方面，这一点尤其相关。在讨论有机键离解焓的情况下，讨论了一种用于策展有用数据集和训练高度准确的图神经网络模型的一般方法，这种策略优于使用预先计算的描述符的回归。最后，我们描述了如何将图神经网络预测纳入化学反应性和选择性的机械启发式统计模型中。一旦经过训练，这种方法就可以避免与量子力学计算相关的昂贵计算开销，同时保持化学可解释性。我们举例说明了通过快速预测键离解焓和通过分子最弱键断裂形成的自由基的身份，在简单的位置选择性和反应性物理模型中使用。

相似文献

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties.在预测有机反应性、选择性和化学性质方面，工程化和学习的分子表示的重要性。

Acc Chem Res. 2021 Feb 16;54(4):827-836. doi: 10.1021/acs.accounts.0c00745. Epub 2021 Feb 3.

A big data approach to the ultra-fast prediction of DFT-calculated bond energies.一种大数据方法，可实现对 DFT 计算键能的超快速预测。

J Cheminform. 2013 Jul 12;5:34. doi: 10.1186/1758-2946-5-34. eCollection 2013.

Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges.分子机器学习在化学催化中的应用：前景与挑战。

Acc Chem Res. 2023 Feb 7;56(3):402-412. doi: 10.1021/acs.accounts.2c00801. Epub 2023 Jan 30.

When Do Quantum Mechanical Descriptors Help Graph Neural Networks to Predict Chemical Properties?量子力学描述符何时有助于图神经网络预测化学性质？

J Am Chem Soc. 2024 Aug 21;146(33):23103-23120. doi: 10.1021/jacs.4c04670. Epub 2024 Aug 6.

Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design.探索过渡金属化学空间：基于第一性原理设计的人工智能

Acc Chem Res. 2021 Feb 2;54(3):532-545. doi: 10.1021/acs.accounts.0c00686. Epub 2021 Jan 22.

Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships.解析过渡金属化学空间：机器学习的特征选择与结构-性质关系

J Phys Chem A. 2017 Nov 22;121(46):8939-8954. doi: 10.1021/acs.jpca.7b08750. Epub 2017 Nov 15.

Predicting Energetics Materials' Crystalline Density from Chemical Structure by Machine Learning.通过机器学习从化学结构预测能质材料的结晶密度。

J Chem Inf Model. 2021 May 24;61(5):2147-2158. doi: 10.1021/acs.jcim.0c01318. Epub 2021 Apr 26.

Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules.多体描述符用于通过机器学习预测分子性质：分子中对相互作用和三体相互作用的分析。

J Chem Theory Comput. 2018 Jun 12;14(6):2991-3003. doi: 10.1021/acs.jctc.8b00110. Epub 2018 May 31.

Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations.通过转换等效化学表示来学习连续且数据驱动的分子描述符。

Chem Sci. 2018 Nov 19;10(6):1692-1701. doi: 10.1039/c8sc04175j. eCollection 2019 Feb 14.

Improving VAE based molecular representations for compound property prediction.改进基于变分自编码器的分子表示以进行化合物性质预测。

J Cheminform. 2022 Oct 14;14(1):69. doi: 10.1186/s13321-022-00648-x.

引用本文的文献

Molecular Rotors as Reactivity Probes: Predicting Electrophilicity from the Speed of Rotation.作为反应性探针的分子转子：根据旋转速度预测亲电性

Angew Chem Int Ed Engl. 2025 Sep 1;64(36):e202510556. doi: 10.1002/anie.202510556. Epub 2025 Jul 29.

AI Approaches to Homogeneous Catalysis with Transition Metal Complexes.过渡金属配合物均相催化的人工智能方法

ACS Catal. 2025 May 14;15(11):9089-9105. doi: 10.1021/acscatal.5c01202. eCollection 2025 Jun 6.

Quinoline Quest: Kynurenic Acid Strategies for Next-Generation Therapeutics via Rational Drug Design.

在预测有机反应性、选择性和化学性质方面，工程化和学习的分子表示的重要性。

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties.

机构信息

Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States.

Biosciences Center, National Renewable Energy Laboratory, 15103 Denver West Parkway, Golden, Colorado 80401, United States.

出版信息

Acc Chem Res. 2021 Feb 16;54(4):827-836. doi: 10.1021/acs.accounts.0c00745. Epub 2021 Feb 3.

DOI:10.1021/acs.accounts.0c00745

PMID:33534534

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在预测有机反应性、选择性和化学性质方面，工程化和学习的分子表示的重要性。

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties.

机构信息

出版信息

相似文献

引用本文的文献

在预测有机反应性、选择性和化学性质方面，工程化和学习的分子表示的重要性。

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties.

机构信息

出版信息

相似文献

引用本文的文献