Suppr超能文献

通过反应凝聚图的学习表示来进行反应性质的机器学习。

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction.

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

出版信息

J Chem Inf Model. 2022 May 9;62(9):2101-2110. doi: 10.1021/acs.jcim.1c00975. Epub 2021 Nov 4.

Abstract

The estimation of chemical reaction properties such as activation energies, rates, or yields is a central topic of computational chemistry. In contrast to molecular properties, where machine learning approaches such as graph convolutional neural networks (GCNNs) have excelled for a wide variety of tasks, no general and transferable adaptations of GCNNs for reactions have been developed yet. We therefore combined a popular cheminformatics reaction representation, the so-called condensed graph of reaction (CGR), with a recent GCNN architecture to arrive at a versatile, robust, and compact deep learning model. The CGR is a superposition of the reactant and product graphs of a chemical reaction and thus an ideal input for graph-based machine learning approaches. The model learns to create a data-driven, task-dependent reaction embedding that does not rely on expert knowledge, similar to current molecular GCNNs. Our approach outperforms current state-of-the-art models in accuracy, is applicable even to imbalanced reactions, and possesses excellent predictive capabilities for diverse target properties, such as activation energies, reaction enthalpies, rate constants, yields, or reaction classes. We furthermore curated a large set of atom-mapped reactions along with their target properties, which can serve as benchmark data sets for future work. All data sets and the developed reaction GCNN model are available online, free of charge, and open source.

摘要

化学动力学性质(如活化能、反应速率或产率)的估计是计算化学的一个核心课题。与机器学习方法(如图卷积神经网络,GCNNs)在各种任务中表现出色的分子性质不同,目前还没有针对反应的 GCNNs 的通用和可转移的适应性。因此,我们将一种流行的化学信息学反应表示方法,即所谓的反应凝聚图(CGR),与最近的 GCNN 架构相结合,开发出了一种多功能、鲁棒且紧凑的深度学习模型。CGR 是化学反应的反应物和产物图的叠加,因此是基于图的机器学习方法的理想输入。该模型学习创建一个数据驱动的、任务相关的反应嵌入,而不依赖于专家知识,类似于当前的分子 GCNNs。我们的方法在准确性方面优于当前最先进的模型,即使对于不平衡的反应也具有很好的适用性,并且对各种目标性质(如活化能、反应焓、速率常数、产率或反应类别)具有出色的预测能力。此外,我们还整理了一套包含目标性质的原子映射反应数据集,可作为未来工作的基准数据集。所有数据集和开发的反应 GCNN 模型均可在线免费获取,且为开源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7042/9092344/8884fd8c9861/ci1c00975_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验