使用多任务机器学习从常规一维核磁共振谱中进行准确高效的结构解析

Accurate and Efficient Structure Elucidation from Routine One-Dimensional NMR Spectra Using Multitask Machine Learning.

作者信息

Hu Frank, Chen Michael S, Rotskoff Grant M, Kanan Matthew W, Markland Thomas E

机构信息

Department of Chemistry, Stanford University, Stanford, California 94305, United States.

Simons Center for Computational Physical Chemistry, Department of Chemistry, New York University, New York, New York 10003, United States.

出版信息

ACS Cent Sci. 2024 Nov 13;10(11):2162-2170. doi: 10.1021/acscentsci.4c01132. eCollection 2024 Nov 27.

DOI:10.1021/acscentsci.4c01132

PMID:39634219

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11613330/

Abstract

Rapid determination of molecular structures can greatly accelerate workflows across many chemical disciplines. However, elucidating structure using only one-dimensional (1D) NMR spectra, the most readily accessible data, remains an extremely challenging problem because of the combinatorial explosion of the number of possible molecules as the number of constituent atoms is increased. Here, we introduce a multitask machine learning framework that predicts the molecular structure (formula and connectivity) of an unknown compound solely based on its 1D H and/or C NMR spectra. First, we show how a transformer architecture can be constructed to efficiently solve the task, traditionally performed by chemists, of assembling large numbers of molecular fragments into molecular structures. Integrating this capability with a convolutional neural network, we build an end-to-end model for predicting structure from spectra that is fast and accurate. We demonstrate the effectiveness of this framework on molecules with up to 19 heavy (non-hydrogen) atoms, a size for which there are trillions of possible structures. Without relying on any prior chemical knowledge such as the molecular formula, we show that our approach predicts the exact molecule 69.6% of the time within the first 15 predictions, reducing the search space by up to 11 orders of magnitude.

摘要

快速确定分子结构能够极大地加速许多化学学科的工作流程。然而，仅使用一维（1D）核磁共振谱（最容易获取的数据）来阐明结构，仍然是一个极具挑战性的问题，因为随着组成原子数量的增加，可能分子的数量会呈组合式爆炸增长。在此，我们引入了一个多任务机器学习框架，该框架仅基于未知化合物的一维氢谱和/或碳谱来预测其分子结构（分子式和连接性）。首先，我们展示了如何构建一种变压器架构，以有效地解决传统上由化学家执行的将大量分子片段组装成分子结构的任务。将此能力与卷积神经网络相结合，我们构建了一个从光谱预测结构的端到端模型，该模型快速且准确。我们在含有多达19个重（非氢）原子的分子上证明了该框架的有效性，对于这种规模的分子，可能的结构有万亿种。在不依赖任何先验化学知识（如分子式）的情况下，我们表明我们的方法在前15次预测中，有69.6%的时间能预测出确切的分子，将搜索空间减少了多达11个数量级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fb0/11613330/8a12840ae921/oc4c01132_0001.jpg

相似文献

Accurate and Efficient Structure Elucidation from Routine One-Dimensional NMR Spectra Using Multitask Machine Learning.使用多任务机器学习从常规一维核磁共振谱中进行准确高效的结构解析

ACS Cent Sci. 2024 Nov 13;10(11):2162-2170. doi: 10.1021/acscentsci.4c01132. eCollection 2024 Nov 27.

A framework for automated structure elucidation from routine NMR spectra.一种从常规核磁共振谱进行自动结构解析的框架。

Chem Sci. 2021 Nov 9;12(46):15329-15338. doi: 10.1039/d1sc04105c. eCollection 2021 Dec 1.

IMPRESSION generation 2 - accurate, fast and generalised neural network model for predicting NMR parameters in place of DFT.第二代IMPRESSION——用于替代密度泛函理论预测核磁共振参数的准确、快速且通用的神经网络模型。

Chem Sci. 2025 Mar 31;16(19):8377-8382. doi: 10.1039/d4sc07858f. eCollection 2025 May 14.

Current and future deep learning algorithms for tandem mass spectrometry (MS/MS)-based small molecule structure elucidation.基于串联质谱（MS/MS）的小分子结构解析的当前及未来深度学习算法

Rapid Commun Mass Spectrom. 2025 May;39 Suppl 1:e9120. doi: 10.1002/rcm.9120. Epub 2021 May 26.

Conditional Molecular Generation Net Enables Automated Structure Elucidation Based on C NMR Spectra and Prior Knowledge.条件分子生成网络实现基于碳核磁共振光谱和先验知识的自动结构解析。

Anal Chem. 2023 Mar 28;95(12):5393-5401. doi: 10.1021/acs.analchem.2c05817. Epub 2023 Mar 16.

GT-NMR: a novel graph transformer-based approach for accurate prediction of NMR chemical shifts.GT-NMR：一种基于图变换器的新型方法，用于精确预测核磁共振化学位移。

J Cheminform. 2024 Nov 26;16(1):132. doi: 10.1186/s13321-024-00927-9.

Deep Reinforcement Learning for Molecular Inverse Problem of Nuclear Magnetic Resonance Spectra to Molecular Structure.用于核磁共振谱分子逆问题到分子结构的深度强化学习

J Phys Chem Lett. 2022 Jun 9;13(22):4924-4933. doi: 10.1021/acs.jpclett.2c00624. Epub 2022 May 29.

Machine learning molecular dynamics for the simulation of infrared spectra.用于红外光谱模拟的机器学习分子动力学

Chem Sci. 2017 Oct 1;8(10):6924-6935. doi: 10.1039/c7sc02267k. Epub 2017 Aug 10.

Automated structure elucidation of organic molecules from (13)c NMR spectra using genetic algorithms and neural networks.利用遗传算法和神经网络从碳-13核磁共振谱自动解析有机分子结构

J Chem Inf Comput Sci. 2001 Nov-Dec;41(6):1535-46. doi: 10.1021/ci0102970.

Deep multimodal saliency parcellation of cerebellar pathways: Linking microstructure and individual function through explainable multitask learning.深度多模态小脑通路显著性分割：通过可解释的多任务学习将微观结构和个体功能联系起来。

Hum Brain Mapp. 2024 Aug 15;45(12):e70008. doi: 10.1002/hbm.70008.

引用本文的文献

Setting new benchmarks in AI-driven infrared structure elucidation.在人工智能驱动的红外结构解析方面设定新的基准。

Digit Discov. 2025 Jun 25. doi: 10.1039/d5dd00131e.

本文引用的文献

Leveraging infrared spectroscopy for automated structure elucidation.利用红外光谱进行自动结构解析。

Commun Chem. 2024 Nov 16;7(1):268. doi: 10.1038/s42004-024-01341-w.

Cross-Modal Retrieval Between C NMR Spectra and Structures Based on Focused Libraries.基于聚焦库的 C NMR 光谱和结构的跨模态检索。

Anal Chem. 2024 Apr 16;96(15):5763-5770. doi: 10.1021/acs.analchem.3c04294. Epub 2024 Apr 2.

Highly Accurate Prediction of NMR Chemical Shifts from Low-Level Quantum Mechanics Calculations Using Machine Learning.利用机器学习从低水平量子力学计算中高精度预测核磁共振化学位移

J Chem Theory Comput. 2024 Mar 12;20(5):2152-2166. doi: 10.1021/acs.jctc.3c01256. Epub 2024 Feb 8.

An autonomous laboratory for the accelerated synthesis of novel materials.自主式实验室，用于加速新型材料的合成。

Nature. 2023 Dec;624(7990):86-91. doi: 10.1038/s41586-023-06734-w. Epub 2023 Nov 29.

Automated nuclear magnetic resonance fingerprinting of mixtures.混合物的自动核磁共振指纹识别

Magn Reson Chem. 2024 Apr;62(4):286-297. doi: 10.1002/mrc.5381. Epub 2023 Jul 29.

Enhancing Efficiency of Natural Product Structure Revision: Leveraging CASE and DFT over Total Synthesis.增强天然产物结构修正的效率：在全合成中利用 CASE 和 DFT。

Molecules. 2023 Apr 28;28(9):3796. doi: 10.3390/molecules28093796.

Anal Chem. 2023 Mar 28;95(12):5393-5401. doi: 10.1021/acs.analchem.2c05817. Epub 2023 Mar 16.

Sherlock-A Free and Open-Source System for the Computer-Assisted Structure Elucidation of Organic Compounds from NMR Data.Sherlock-一个免费的和开源的系统，用于从 NMR 数据中计算机辅助有机化合物结构解析。

Molecules. 2023 Feb 2;28(3):1448. doi: 10.3390/molecules28031448.

Deep Learning-Based Method for Compound Identification in NMR Spectra of Mixtures.基于深度学习的混合物 NMR 光谱化合物识别方法。

Molecules. 2022 Jun 7;27(12):3653. doi: 10.3390/molecules27123653.

Identifying molecular functional groups of organic compounds by deep learning of NMR data.通过对 NMR 数据的深度学习来识别有机化合物的分子官能团。

Magn Reson Chem. 2022 Nov;60(11):1061-1069. doi: 10.1002/mrc.5292. Epub 2022 Jun 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用多任务机器学习从常规一维核磁共振谱中进行准确高效的结构解析

Accurate and Efficient Structure Elucidation from Routine One-Dimensional NMR Spectra Using Multitask Machine Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献