“酰胺 - 胺 + 醇 = 羧酸。” 作为图神经网络中线性代数类比的化学反应。

"Amide - amine + alcohol = carboxylic acid." chemical reactions as linear algebraic analogies in graph neural networks.

作者信息

El-Samman Amer Marwan, De Baerdemacker Stijn

机构信息

University of New Brunswick, Department of Chemistry. 30 Dineen Dr Fredericton Canada

University of New Brunswick, Department of Mathematics and Statistics. 30 Dineen Dr Fredericton Canada

出版信息

Chem Sci. 2025 Apr 23. doi: 10.1039/d4sc05655h.

DOI:10.1039/d4sc05655h

PMID:40395375

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12086445/

Abstract

In deep learning methods, especially in the context of chemistry, there is an increasing urgency to uncover the hidden learning mechanisms often dubbed as "black box." In this work, we show that graph models built on computational chemical data behave similar to natural language processing (NLP) models built on text data. Crucially, we show that atom-embeddings, a.k.a atom-parsed graph neural activation patterns, exhibit arithmetic properties that represent valid reaction formulas. This is very similar to how word-embeddings can be combined to make word analogies, thus preserving the semantic meaning behind the words, as in the famous example "King" - "Man" + "Woman" = "Queen." For instance, we show how the reaction from an alcohol to a carbonyl is represented by a constant vector in the embedding space, implicitly representing "-H." This vector is independent from the particular carbonyl reactant and alcohol product and represents a consistent chemical transformation. Other directions in the embedding space are synonymous with distinct chemical changes (ex. the tautomerization direction). In contrast to natural language processing, we can explain the observed chemical analogies using algebraic manipulations on the local chemical composition that surrounds each atom-embedding. Furthermore, the observations find applications in transfer learning, for instance in the formal structure and prediction of atomistic properties, such as H-NMR and C-NMR. This work is in line with the recent push for interpretable explanations to graph neural network modeling of chemistry and uncovers a latent model of chemistry that is highly structured, consistent, and analogous to chemical syntax.

摘要

在深度学习方法中，尤其是在化学领域，揭示通常被称为“黑匣子”的隐藏学习机制的紧迫性日益增加。在这项工作中，我们表明基于计算化学数据构建的图模型的行为与基于文本数据构建的自然语言处理（NLP）模型相似。至关重要的是，我们表明原子嵌入（即原子解析的图神经激活模式）表现出代表有效反应式的算术属性。这与词嵌入如何组合以形成词类比非常相似，从而保留了词背后的语义含义，就像著名的例子“国王”-“男人”+“女人”=“女王”一样。例如，我们展示了从醇到羰基的反应如何由嵌入空间中的一个常数向量表示，隐含地表示“-H”。这个向量独立于特定的羰基反应物和醇产物，代表了一种一致的化学转化。嵌入空间中的其他方向与不同的化学变化同义（例如互变异构方向）。与自然语言处理不同，我们可以使用围绕每个原子嵌入的局部化学组成的代数运算来解释观察到的化学类比。此外，这些观察结果在迁移学习中有应用，例如在原子性质（如H-NMR和C-NMR）的形式结构和预测中。这项工作与最近对化学图神经网络建模的可解释性解释的推动相一致，并揭示了一种高度结构化、一致且类似于化学句法的潜在化学模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e2/12175590/0a41e9a2259d/d4sc05655h-f1.jpg

相似文献

"Amide - amine + alcohol = carboxylic acid." chemical reactions as linear algebraic analogies in graph neural networks.

Chem Sci. 2025 Apr 23. doi: 10.1039/d4sc05655h.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

Interventions for fertility preservation in women with cancer undergoing chemotherapy.

Cochrane Database Syst Rev. 2025 Jun 19;6:CD012891. doi: 10.1002/14651858.CD012891.pub2.

Aural toilet (ear cleaning) for chronic suppurative otitis media.

Cochrane Database Syst Rev. 2025 Jun 9;6(6):CD013057. doi: 10.1002/14651858.CD013057.pub3.

Pelvic floor muscle training with feedback or biofeedback for urinary incontinence in women.

Cochrane Database Syst Rev. 2025 Mar 11;3(3):CD009252. doi: 10.1002/14651858.CD009252.pub2.

Molecular feature-based classification of retroperitoneal liposarcoma: a prospective cohort study.

Elife. 2025 May 23;14:RP100887. doi: 10.7554/eLife.100887.

Prognostic factors for return to work in breast cancer survivors.

Cochrane Database Syst Rev. 2025 May 7;5(5):CD015124. doi: 10.1002/14651858.CD015124.pub2.

Restless reachability problems in temporal graphs.

Knowl Inf Syst. 2025;67(7):5651-5697. doi: 10.1007/s10115-025-02405-6. Epub 2025 Apr 1.

The Changing Epidemiology of Type 1 Diabetes: A Global Perspective.

Diabetes Obes Metab. 2025 Jun 19. doi: 10.1111/dom.16501.

Probiotics for treatment of chronic constipation in children.

Cochrane Database Syst Rev. 2022 Mar 29;3(3):CD014257. doi: 10.1002/14651858.CD014257.pub2.

本文引用的文献

Explainable chemical artificial intelligence from accurate machine learning of real-space chemical descriptors.

Nat Commun. 2024 May 21;15(1):4345. doi: 10.1038/s41467-024-48567-9.

Explainable Solvation Free Energy Prediction Combining Graph Neural Networks with Chemical Intuition.

J Chem Inf Model. 2022 Nov 28;62(22):5457-5470. doi: 10.1021/acs.jcim.2c01013. Epub 2022 Nov 1.

Scalable graph neural network for NMR chemical shift prediction.

Phys Chem Chem Phys. 2022 Nov 9;24(43):26870-26878. doi: 10.1039/d2cp04542g.

SELFIES and the future of molecular string representations.

Patterns (N Y). 2022 Oct 14;3(10):100588. doi: 10.1016/j.patter.2022.100588.

Accurate Prediction of Aqueous Free Solvation Energies Using 3D Atomic Feature-Based Graph Neural Network with Transfer Learning.

J Chem Inf Model. 2022 Apr 25;62(8):1840-1848. doi: 10.1021/acs.jcim.2c00260. Epub 2022 Apr 14.

Using deep neural networks to explore chemical space.

Expert Opin Drug Discov. 2022 Mar;17(3):297-304. doi: 10.1080/17460441.2022.2019704. Epub 2021 Dec 29.

A compact review of molecular property prediction with graph neural networks.

Drug Discov Today Technol. 2020 Dec;37:1-12. doi: 10.1016/j.ddtec.2020.11.009. Epub 2020 Dec 17.

Real-time prediction of H and C chemical shifts with DFT accuracy using a 3D graph neural network.

Chem Sci. 2021 Aug 9;12(36):12012-12026. doi: 10.1039/d1sc03343c. eCollection 2021 Sep 22.

Multi-instance learning of graph neural networks for aqueous pKa prediction.

Bioinformatics. 2022 Jan 12;38(3):792-798. doi: 10.1093/bioinformatics/btab714.

Learning Atomic Interactions through Solvation Free Energy Prediction Using Graph Neural Networks.

J Chem Inf Model. 2021 Feb 22;61(2):689-698. doi: 10.1021/acs.jcim.0c01413. Epub 2021 Feb 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

“酰胺 - 胺 + 醇 = 羧酸。” 作为图神经网络中线性代数类比的化学反应。

"Amide - amine + alcohol = carboxylic acid." chemical reactions as linear algebraic analogies in graph neural networks.

作者信息

El-Samman Amer Marwan, De Baerdemacker Stijn

机构信息

University of New Brunswick, Department of Chemistry. 30 Dineen Dr Fredericton Canada

University of New Brunswick, Department of Mathematics and Statistics. 30 Dineen Dr Fredericton Canada

出版信息

Chem Sci. 2025 Apr 23. doi: 10.1039/d4sc05655h.

DOI:10.1039/d4sc05655h

PMID:40395375

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12086445/

Abstract

摘要

“酰胺 - 胺 + 醇 = 羧酸。” 作为图神经网络中线性代数类比的化学反应。

"Amide - amine + alcohol = carboxylic acid." chemical reactions as linear algebraic analogies in graph neural networks.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

“酰胺 - 胺 + 醇 = 羧酸。” 作为图神经网络中线性代数类比的化学反应。

"Amide - amine + alcohol = carboxylic acid." chemical reactions as linear algebraic analogies in graph neural networks.

作者信息

机构信息

出版信息

相似文献

本文引用的文献