Suppr超能文献

量子化学增强神经网络在反应性预测中的应用:性能、泛化能力和可解释性。

Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability.

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA.

出版信息

J Chem Phys. 2022 Feb 28;156(8):084104. doi: 10.1063/5.0079574.

Abstract

There is a perceived dichotomy between structure-based and descriptor-based molecular representations used for predictive chemistry tasks. Here, we study the performance, generalizability, and explainability of the quantum mechanics-augmented graph neural network (ml-QM-GNN) architecture as applied to the prediction of regioselectivity (classification) and of activation energies (regression). In our hybrid QM-augmented model architecture, structure-based representations are first used to predict a set of atom- and bond-level reactivity descriptors derived from density functional theory calculations. These estimated reactivity descriptors are combined with the original structure-based representation to make the final reactivity prediction. We demonstrate that our model architecture leads to significant improvements over structure-based GNNs in not only overall accuracy but also in generalization to unseen compounds. Even when provided training sets of only a couple hundred labeled data points, the ml-QM-GNN outperforms other state-of-the-art structure-based architectures that have been applied to these tasks as well as descriptor-based (linear) regressions. As a primary contribution of this work, we demonstrate a bridge between data-driven predictions and conceptual frameworks commonly used to gain qualitative insights into reactivity phenomena, taking advantage of the fact that our models are grounded in (but not restricted to) QM descriptors. This effort results in a productive synergy between theory and data science, wherein QM-augmented models provide a data-driven confirmation of previous qualitative analyses, and these analyses in turn facilitate insights into the decision-making process occurring within ml-QM-GNNs.

摘要

在用于预测化学任务的基于结构和基于描述符的分子表示之间存在一种感知到的二分法。在这里,我们研究了量子力学增强图神经网络 (ml-QM-GNN) 架构在预测区域选择性(分类)和激活能(回归)方面的性能、泛化能力和可解释性。在我们的混合 QM 增强模型架构中,首先使用基于结构的表示来预测一组从密度泛函理论计算中得出的原子和键级反应性描述符。这些估计的反应性描述符与原始基于结构的表示结合起来进行最终的反应性预测。我们证明,我们的模型架构不仅在整体准确性方面,而且在对未见化合物的泛化能力方面,都优于基于结构的 GNN。即使只提供了几百个标记数据点的训练集,ml-QM-GNN 也优于其他已应用于这些任务的基于结构的最先进架构以及基于描述符的(线性)回归。作为这项工作的主要贡献之一,我们展示了数据驱动预测与通常用于深入了解反应性现象的概念框架之间的桥梁,利用了这样一个事实,即我们的模型基于(但不限于)QM 描述符。这一努力在理论和数据科学之间产生了富有成效的协同作用,其中 QM 增强模型为先前的定性分析提供了数据驱动的确认,而这些分析反过来又有助于深入了解 ml-QM-GNN 内部发生的决策过程。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验