Suppr超能文献

在预测有机反应性、选择性和化学性质方面,工程化和学习的分子表示的重要性。

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties.

机构信息

Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States.

Biosciences Center, National Renewable Energy Laboratory, 15103 Denver West Parkway, Golden, Colorado 80401, United States.

出版信息

Acc Chem Res. 2021 Feb 16;54(4):827-836. doi: 10.1021/acs.accounts.0c00745. Epub 2021 Feb 3.

Abstract

Machine-readable chemical structure representations are foundational in all attempts to harness machine learning for the prediction of reactivities, selectivities, and chemical properties directly from molecular structure. The featurization of discrete chemical structures into a continuous vector space is a critical phase undertaken before model selection, and the development of new ways to quantitatively encode molecules is an active area of research. In this Account, we highlight the application and suitability of different representations, from expert-guided "engineered" descriptors to automatically "learned" features, in different prediction tasks relevant to organic and organometallic chemistry, where differing amounts of training data are available. These tasks include statistical models of stereo- and enantioselectivity, thermochemistry, and kinetics developed using experimental and quantum chemical data.The use of expert-guided molecular descriptors provides an opportunity to incorporate chemical knowledge, domain expertise, and physical constraints into statistical modeling. In applications to stereoselective organic and organometallic catalysis, where data sets may be relatively small and 3D-geometries and conformations play an important role, mechanistically informed features can be used successfully to obtain predictive statistical models that are also chemically interpretable. We provide an overview of several recent applications of this approach to obtain quantitative models for reactivity and selectivity, where topological descriptors, quantum mechanical calculations of electronic and steric properties, along with conformational ensembles, all feature as essential ingredients of the molecular representations used.Alternatively, more flexible, general-purpose molecular representations such as attributed molecular graphs can be used with machine learning approaches to learn the complex relationship between a structure and prediction target. This approach has the potential to out-perform more traditional representation methods such as "hand-crafted" molecular descriptors, particularly as data set sizes grow. One area where this is particularly relevant is in the use of large sets of quantum mechanical data to train quantitative structure-property relationships. A general approach toward curating useful data sets and training highly accurate graph neural network models is discussed in the context of organic bond dissociation enthalpies, where this strategy outperforms regression using precomputed descriptors.Finally, we describe how graph neural network predictions can be incorporated into mechanistically informed statistical models of chemical reactivity and selectivity. Once trained, this approach avoids the expensive computational overhead associated with quantum mechanical calculations, while maintaining chemical interpretability. We illustrate examples for which fast predictions of bond dissociation enthalpy and of the identities of radicals formed through cleavage of a molecule's weakest bond are used in simple physical models of site-selectivity and reactivity.

摘要

机器可读的化学结构表示法是利用机器学习直接从分子结构预测反应性、选择性和化学性质的所有尝试的基础。将离散的化学结构特征化为连续的向量空间是在模型选择之前进行的关键阶段,开发新的定量编码分子的方法是一个活跃的研究领域。在本报告中,我们强调了不同表示方法的应用和适用性,从专家指导的“设计”描述符到自动“学习”的特征,这些方法在与有机和有机金属化学相关的不同预测任务中都有应用,这些任务涉及到不同数量的训练数据。这些任务包括使用实验和量子化学数据开发的立体和对映选择性、热化学和动力学的统计模型。使用专家指导的分子描述符提供了一个机会,可以将化学知识、领域专业知识和物理约束纳入统计建模中。在应用于立体选择性有机和有机金属催化的过程中,数据集可能相对较小,3D 几何形状和构象起着重要作用,因此可以成功使用机械启发式特征来获得可预测的统计模型,这些模型也具有化学可解释性。我们概述了这种方法的几个最新应用,以获得反应性和选择性的定量模型,其中拓扑描述符、电子和立体性质的量子力学计算以及构象系综都是所使用的分子表示的基本成分。或者,可以使用更灵活的通用分子表示形式,例如带属性的分子图,并结合机器学习方法来学习结构与预测目标之间的复杂关系。随着数据集规模的增长,这种方法有可能优于更传统的表示方法,例如“手工制作”的分子描述符。在使用大量量子力学数据来训练定量结构-性质关系方面,这一点尤其相关。在讨论有机键离解焓的情况下,讨论了一种用于策展有用数据集和训练高度准确的图神经网络模型的一般方法,这种策略优于使用预先计算的描述符的回归。最后,我们描述了如何将图神经网络预测纳入化学反应性和选择性的机械启发式统计模型中。一旦经过训练,这种方法就可以避免与量子力学计算相关的昂贵计算开销,同时保持化学可解释性。我们举例说明了通过快速预测键离解焓和通过分子最弱键断裂形成的自由基的身份,在简单的位置选择性和反应性物理模型中使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验