Process and Systems Engineering Center (PROSYS), Department of Chemical and Biochemical Engineering, Technical University of Denmark, Kgs. LyngbyDK-2800, Denmark.
J Chem Inf Model. 2023 Feb 13;63(3):725-744. doi: 10.1021/acs.jcim.2c01091. Epub 2023 Jan 30.
Quantitative structure-property relationships (QSPRs) are important tools to facilitate and accelerate the discovery of compounds with desired properties. While many QSPRs have been developed, they are associated with various shortcomings such as a lack of generalizability and modest accuracy. Albeit various machine-learning and deep-learning techniques have been integrated into such models, another shortcoming has emerged in the form of a lack of transparency and interpretability of such models. In this work, two interpretable graph neural network (GNN) models (attentive group-contribution (AGC) and group-contribution-based graph attention (GroupGAT)) are developed by integrating fundamentals using the concept of group contributions (GC). The interpretability consists of highlighting the substructure with the highest attention weights in the latent representation of the molecules using the attention mechanism. The proposed models showcased better performance compared to classical group-contribution models, as well as against various other GNN models describing the aqueous solubility, melting point, and enthalpies of formation, combustion, and fusion of organic compounds. The insights provided are consistent with insights obtained from the semiempirical GC models confirming that the proposed framework allows highlighting the important substructures of the molecules for a specific property.
定量构效关系(QSPR)是促进和加速发现具有所需性质的化合物的重要工具。虽然已经开发了许多 QSPR,但它们存在各种缺点,例如缺乏通用性和适度的准确性。尽管已经将各种机器学习和深度学习技术集成到这些模型中,但这些模型的另一个缺点是缺乏透明度和可解释性。在这项工作中,通过使用基团贡献(GC)的概念集成基础,开发了两个可解释的图神经网络(GNN)模型(注意基团贡献(AGC)和基于基团贡献的图注意力(GroupGAT))。可解释性包括使用注意力机制突出分子潜在表示中具有最高注意力权重的子结构。与描述有机化合物的水溶液溶解度、熔点、生成焓、燃烧和熔融的经典基团贡献模型以及其他各种 GNN 模型相比,所提出的模型表现出更好的性能。提供的见解与从半经验 GC 模型获得的见解一致,证实了所提出的框架允许突出分子的重要子结构用于特定性质。