Suppr超能文献

量子力学描述符何时有助于图神经网络预测化学性质?

When Do Quantum Mechanical Descriptors Help Graph Neural Networks to Predict Chemical Properties?

作者信息

Li Shih-Cheng, Wu Haoyang, Menon Angiras, Spiekermann Kevin A, Li Yi-Pei, Green William H

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan.

出版信息

J Am Chem Soc. 2024 Aug 21;146(33):23103-23120. doi: 10.1021/jacs.4c04670. Epub 2024 Aug 6.

Abstract

Deep graph neural networks are extensively utilized to predict chemical reactivity and molecular properties. However, because of the complexity of chemical space, such models often have difficulty extrapolating beyond the chemistry contained in the training set. Augmenting the model with quantum mechanical (QM) descriptors is anticipated to improve its generalizability. However, obtaining QM descriptors often requires CPU-intensive computational chemistry calculations. To identify when QM descriptors help graph neural networks predict chemical properties, we conduct a systematic investigation of the impact of atom, bond, and molecular QM descriptors on the performance of directed message passing neural networks (D-MPNNs) for predicting 16 molecular properties. The analysis surveys computational and experimental targets, as well as classification and regression tasks, and varied data set sizes from several hundred to hundreds of thousands of data points. Our results indicate that QM descriptors are mostly beneficial for D-MPNN performance on small data sets, provided that the descriptors correlate well with the targets and can be readily computed with high accuracy. Otherwise, using QM descriptors can add cost without benefit or even introduce unwanted noise that can degrade model performance. Strategic integration of QM descriptors with D-MPNN unlocks potential for physics-informed, data-efficient modeling with some interpretability that can streamline drug and material designs. To facilitate the use of QM descriptors in machine learning workflows for chemistry, we provide a set of guidelines regarding when and how to best leverage QM descriptors, a high-throughput workflow to compute them, and an enhancement to Chemprop, a widely adopted open-source D-MPNN implementation for chemical property prediction.

摘要

深度图神经网络被广泛用于预测化学反应性和分子性质。然而,由于化学空间的复杂性,此类模型往往难以外推到训练集中所含化学物质之外的情况。预计用量子力学(QM)描述符增强模型可提高其泛化能力。然而,获取QM描述符通常需要耗费大量CPU的计算化学计算。为了确定QM描述符何时有助于图神经网络预测化学性质,我们系统地研究了原子、键和分子QM描述符对用于预测16种分子性质的定向消息传递神经网络(D-MPNN)性能的影响。该分析涵盖了计算和实验目标,以及分类和回归任务,还包括从几百到数十万数据点不等的各种数据集大小。我们的结果表明,QM描述符对于小数据集上的D-MPNN性能大多是有益的,前提是这些描述符与目标具有良好的相关性,并且能够以高精度轻松计算。否则,使用QM描述符可能会增加成本却无益处,甚至引入有害噪声从而降低模型性能。将QM描述符与D-MPNN进行策略性整合,为具有一定可解释性的物理信息、数据高效建模开辟了潜力,这可以简化药物和材料设计。为了便于在化学的机器学习工作流程中使用QM描述符,我们提供了一组关于何时以及如何最佳利用QM描述符的指南、一个用于计算它们的高通量工作流程,以及对Chemprop(一种广泛采用的用于化学性质预测的开源D-MPNN实现)的增强。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验