Suppr超能文献

“酰胺 - 胺 + 醇 = 羧酸。” 作为图神经网络中线性代数类比的化学反应。

"Amide - amine + alcohol = carboxylic acid." chemical reactions as linear algebraic analogies in graph neural networks.

作者信息

El-Samman Amer Marwan, De Baerdemacker Stijn

机构信息

University of New Brunswick, Department of Chemistry. 30 Dineen Dr Fredericton Canada

University of New Brunswick, Department of Mathematics and Statistics. 30 Dineen Dr Fredericton Canada

出版信息

Chem Sci. 2025 Apr 23. doi: 10.1039/d4sc05655h.

Abstract

In deep learning methods, especially in the context of chemistry, there is an increasing urgency to uncover the hidden learning mechanisms often dubbed as "black box." In this work, we show that graph models built on computational chemical data behave similar to natural language processing (NLP) models built on text data. Crucially, we show that atom-embeddings, a.k.a atom-parsed graph neural activation patterns, exhibit arithmetic properties that represent valid reaction formulas. This is very similar to how word-embeddings can be combined to make word analogies, thus preserving the semantic meaning behind the words, as in the famous example "King" - "Man" + "Woman" = "Queen." For instance, we show how the reaction from an alcohol to a carbonyl is represented by a constant vector in the embedding space, implicitly representing "-H." This vector is independent from the particular carbonyl reactant and alcohol product and represents a consistent chemical transformation. Other directions in the embedding space are synonymous with distinct chemical changes (ex. the tautomerization direction). In contrast to natural language processing, we can explain the observed chemical analogies using algebraic manipulations on the local chemical composition that surrounds each atom-embedding. Furthermore, the observations find applications in transfer learning, for instance in the formal structure and prediction of atomistic properties, such as H-NMR and C-NMR. This work is in line with the recent push for interpretable explanations to graph neural network modeling of chemistry and uncovers a latent model of chemistry that is highly structured, consistent, and analogous to chemical syntax.

摘要

在深度学习方法中,尤其是在化学领域,揭示通常被称为“黑匣子”的隐藏学习机制的紧迫性日益增加。在这项工作中,我们表明基于计算化学数据构建的图模型的行为与基于文本数据构建的自然语言处理(NLP)模型相似。至关重要的是,我们表明原子嵌入(即原子解析的图神经激活模式)表现出代表有效反应式的算术属性。这与词嵌入如何组合以形成词类比非常相似,从而保留了词背后的语义含义,就像著名的例子“国王”-“男人”+“女人”=“女王”一样。例如,我们展示了从醇到羰基的反应如何由嵌入空间中的一个常数向量表示,隐含地表示“-H”。这个向量独立于特定的羰基反应物和醇产物,代表了一种一致的化学转化。嵌入空间中的其他方向与不同的化学变化同义(例如互变异构方向)。与自然语言处理不同,我们可以使用围绕每个原子嵌入的局部化学组成的代数运算来解释观察到的化学类比。此外,这些观察结果在迁移学习中有应用,例如在原子性质(如H-NMR和C-NMR)的形式结构和预测中。这项工作与最近对化学图神经网络建模的可解释性解释的推动相一致,并揭示了一种高度结构化、一致且类似于化学句法的潜在化学模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e2/12175590/0a41e9a2259d/d4sc05655h-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验