• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用红外光谱进行自动结构解析。

Leveraging infrared spectroscopy for automated structure elucidation.

作者信息

Alberts Marvin, Laino Teodoro, Vaucher Alain C

机构信息

IBM Research Europe, Rüschlikon, Switzerland.

University of Zürich, Department of Chemistry, Zürich, Switzerland.

出版信息

Commun Chem. 2024 Nov 16;7(1):268. doi: 10.1038/s42004-024-01341-w.

DOI:10.1038/s42004-024-01341-w
PMID:39550488
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11569215/
Abstract

The application of machine learning models in chemistry has made remarkable strides in recent years. While analytical chemistry has received considerable interest from machine learning practitioners, its adoption into everyday use remains limited. Among the available analytical methods, Infrared (IR) spectroscopy stands out in terms of affordability, simplicity, and accessibility. However, its use has been limited to the identification of a selected few functional groups, as most peaks lie beyond human interpretation. We present a transformer model that enables chemists to leverage the complete information contained within an IR spectrum to directly predict the molecular structure. To cover a large chemical space, we pretrain the model using 634,585 simulated IR spectra and fine-tune it on 3,453 experimental spectra. Our approach achieves a top-1 accuracy of 44.4% and top-10 accuracy of 69.8% on compounds containing 6 to 13 heavy atoms. When solely predicting scaffolds, the model accurately predicts the top-1 scaffold in 84.5% and among the top-10 in 93.0% of cases.

摘要

近年来,机器学习模型在化学领域的应用取得了显著进展。虽然分析化学已引起机器学习从业者的广泛关注,但其在日常应用中的采用仍然有限。在现有的分析方法中,红外(IR)光谱在可承受性、简单性和可及性方面表现突出。然而,由于大多数峰超出了人类的解读能力,其应用仅限于识别少数几个官能团。我们提出了一种变压器模型,使化学家能够利用红外光谱中包含的完整信息直接预测分子结构。为了覆盖广阔的化学空间,我们使用634,585个模拟红外光谱对模型进行预训练,并在3,453个实验光谱上对其进行微调。我们的方法在含有6至13个重原子的化合物上实现了44.4%的top-1准确率和69.8%的top-10准确率。当仅预测骨架时,该模型在84.5%的情况下准确预测top-1骨架,在93.0%的情况下准确预测top-10骨架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/e505c12f2f2d/42004_2024_1341_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/a562000261fe/42004_2024_1341_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/e4e445651a7a/42004_2024_1341_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/77db269ad0cb/42004_2024_1341_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/e2a49eb45be8/42004_2024_1341_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/be164b02ee54/42004_2024_1341_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/13f1b7f792ee/42004_2024_1341_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/6faa2ad02a7b/42004_2024_1341_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/58a84e2828b3/42004_2024_1341_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/e505c12f2f2d/42004_2024_1341_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/a562000261fe/42004_2024_1341_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/e4e445651a7a/42004_2024_1341_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/77db269ad0cb/42004_2024_1341_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/e2a49eb45be8/42004_2024_1341_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/be164b02ee54/42004_2024_1341_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/13f1b7f792ee/42004_2024_1341_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/6faa2ad02a7b/42004_2024_1341_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/58a84e2828b3/42004_2024_1341_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/e505c12f2f2d/42004_2024_1341_Fig9_HTML.jpg

相似文献

1
Leveraging infrared spectroscopy for automated structure elucidation.利用红外光谱进行自动结构解析。
Commun Chem. 2024 Nov 16;7(1):268. doi: 10.1038/s42004-024-01341-w.
2
A framework for automated structure elucidation from routine NMR spectra.一种从常规核磁共振谱进行自动结构解析的框架。
Chem Sci. 2021 Nov 9;12(46):15329-15338. doi: 10.1039/d1sc04105c. eCollection 2021 Dec 1.
3
Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.迁移学习:基于小规模化学反应数据集的逆向合成预测扩展到新的水平。
Molecules. 2020 May 19;25(10):2357. doi: 10.3390/molecules25102357.
4
Machine learning molecular dynamics for the simulation of infrared spectra.用于红外光谱模拟的机器学习分子动力学
Chem Sci. 2017 Oct 1;8(10):6924-6935. doi: 10.1039/c7sc02267k. Epub 2017 Aug 10.
5
Machine Learning for Quantitative Structural Information from Infrared Spectra: The Case of Palladium Hydride.基于红外光谱的定量结构信息机器学习:氢化钯实例
Small Methods. 2024 Jul;8(7):e2301397. doi: 10.1002/smtd.202301397. Epub 2024 Jan 31.
6
Combining Experimental with Computational Infrared and Mass Spectra for High-Throughput Nontargeted Chemical Structure Identification.结合实验与计算红外光谱和质谱用于高通量非靶向化学结构鉴定
Anal Chem. 2023 Aug 15;95(32):11901-11907. doi: 10.1021/acs.analchem.3c00937. Epub 2023 Aug 4.
7
MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra.MassGenie:一种基于 Transformer 的深度学习方法,用于从其质谱中识别小分子。
Biomolecules. 2021 Nov 30;11(12):1793. doi: 10.3390/biom11121793.
8
Machine learning-based typing of Salmonella enterica O-serogroups by the Fourier-Transform Infrared (FTIR) Spectroscopy-based IR Biotyper system.基于傅里叶变换红外(FTIR)光谱的 IR 生物鉴定系统的基于机器学习的沙门氏菌血清型 O 群分型。
J Microbiol Methods. 2022 Oct;201:106564. doi: 10.1016/j.mimet.2022.106564. Epub 2022 Sep 6.
9
Fcg-Former: Identification of Functional Groups in FTIR Spectra Using Enhanced Transformer-Based Model.
Anal Chem. 2024 Jul 15. doi: 10.1021/acs.analchem.4c01622.
10
Infrared Spectral Analysis for Prediction of Functional Groups Based on Feature-Aggregated Deep Learning.基于特征聚合深度学习的功能基团红外光谱预测。
J Chem Inf Model. 2023 Aug 14;63(15):4615-4622. doi: 10.1021/acs.jcim.3c00749. Epub 2023 Aug 2.

引用本文的文献

1
From Spectra to Structure: AI-Powered P NMR Interpretation.从光谱到结构:人工智能助力的磷核磁共振波谱解析
Anal Chem. 2025 Jul 29;97(29):15736-15742. doi: 10.1021/acs.analchem.5c01460. Epub 2025 Jul 16.
2
A transformer based generative chemical language AI model for structural elucidation of organic compounds.一种基于变压器的生成式化学语言人工智能模型,用于有机化合物的结构解析。
J Cheminform. 2025 Jul 12;17(1):103. doi: 10.1186/s13321-025-01016-1.
3
Setting new benchmarks in AI-driven infrared structure elucidation.在人工智能驱动的红外结构解析方面设定新的基准。

本文引用的文献

1
Graphormer-IR: Graph Transformers Predict Experimental IR Spectra Using Highly Specialized Attention.Graphormer-IR:使用高度专业化的注意力的图变换预测实验红外光谱。
J Chem Inf Model. 2024 Jun 24;64(12):4613-4629. doi: 10.1021/acs.jcim.4c00378. Epub 2024 Jun 6.
2
Automatic materials characterization from infrared spectra using convolutional neural networks.使用卷积神经网络从红外光谱中进行自动材料表征。
Chem Sci. 2023 Feb 23;14(13):3600-3609. doi: 10.1039/d2sc05892h. eCollection 2023 Mar 29.
3
Leveraging molecular structure and bioactivity with chemical language models for de novo drug design.
Digit Discov. 2025 Jun 25. doi: 10.1039/d5dd00131e.
4
Heuristic optimization in classification atoms in molecules using GCN via uniform simulated annealing.通过均匀模拟退火使用图卷积网络对分子中的原子进行分类的启发式优化。
Sci Rep. 2025 May 20;15(1):17519. doi: 10.1038/s41598-025-00340-8.
5
Unlocking the Potential of Machine Learning in Enhancing Quantum Chemical Calculations for Infrared Spectral Prediction.挖掘机器学习在增强用于红外光谱预测的量子化学计算方面的潜力。
ACS Omega. 2025 Apr 28;10(18):19224-19234. doi: 10.1021/acsomega.5c02405. eCollection 2025 May 13.
6
Accurate and Efficient Structure Elucidation from Routine One-Dimensional NMR Spectra Using Multitask Machine Learning.使用多任务机器学习从常规一维核磁共振谱中进行准确高效的结构解析
ACS Cent Sci. 2024 Nov 13;10(11):2162-2170. doi: 10.1021/acscentsci.4c01132. eCollection 2024 Nov 27.
利用分子结构和生物活性与化学语言模型进行从头药物设计。
Nat Commun. 2023 Jan 7;14(1):114. doi: 10.1038/s41467-022-35692-6.
4
PubChem 2023 update.PubChem 2023 更新。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1373-D1380. doi: 10.1093/nar/gkac956.
5
MolGPT: Molecular Generation Using a Transformer-Decoder Model.MolGPT:基于 Transformer-Decoder 模型的分子生成。
J Chem Inf Model. 2022 May 9;62(9):2064-2076. doi: 10.1021/acs.jcim.1c00600. Epub 2021 Oct 25.
6
Functional Group Identification for FTIR Spectra Using Image-Based Machine Learning Models.使用基于图像的机器学习模型对傅里叶变换红外光谱进行官能团识别
Anal Chem. 2021 Jul 20;93(28):9711-9718. doi: 10.1021/acs.analchem.1c00867. Epub 2021 Jun 30.
7
Spectral deep learning for prediction and prospective validation of functional groups.用于官能团预测和前瞻性验证的光谱深度学习
Chem Sci. 2020 Mar 13;11(18):4618-4630. doi: 10.1039/c9sc06240h.
8
Predicting Infrared Spectra with Message Passing Neural Networks.用消息传递神经网络预测红外光谱。
J Chem Inf Model. 2021 Jun 28;61(6):2594-2609. doi: 10.1021/acs.jcim.1c00055. Epub 2021 May 28.
9
Inferring experimental procedures from text-based representations of chemical reactions.从基于文本的化学反应表示形式中推断实验步骤。
Nat Commun. 2021 May 6;12(1):2573. doi: 10.1038/s41467-021-22951-1.
10
Quantitative Comparison of Experimental and Computed IR-Spectra Extracted from Ab Initio Molecular Dynamics.从从头算分子动力学中提取的实验和计算红外光谱的定量比较。
J Chem Theory Comput. 2021 Feb 9;17(2):985-995. doi: 10.1021/acs.jctc.0c01279. Epub 2021 Jan 29.