Suppr超能文献

利用红外光谱进行自动结构解析。

Leveraging infrared spectroscopy for automated structure elucidation.

作者信息

Alberts Marvin, Laino Teodoro, Vaucher Alain C

机构信息

IBM Research Europe, Rüschlikon, Switzerland.

University of Zürich, Department of Chemistry, Zürich, Switzerland.

出版信息

Commun Chem. 2024 Nov 16;7(1):268. doi: 10.1038/s42004-024-01341-w.

Abstract

The application of machine learning models in chemistry has made remarkable strides in recent years. While analytical chemistry has received considerable interest from machine learning practitioners, its adoption into everyday use remains limited. Among the available analytical methods, Infrared (IR) spectroscopy stands out in terms of affordability, simplicity, and accessibility. However, its use has been limited to the identification of a selected few functional groups, as most peaks lie beyond human interpretation. We present a transformer model that enables chemists to leverage the complete information contained within an IR spectrum to directly predict the molecular structure. To cover a large chemical space, we pretrain the model using 634,585 simulated IR spectra and fine-tune it on 3,453 experimental spectra. Our approach achieves a top-1 accuracy of 44.4% and top-10 accuracy of 69.8% on compounds containing 6 to 13 heavy atoms. When solely predicting scaffolds, the model accurately predicts the top-1 scaffold in 84.5% and among the top-10 in 93.0% of cases.

摘要

近年来,机器学习模型在化学领域的应用取得了显著进展。虽然分析化学已引起机器学习从业者的广泛关注,但其在日常应用中的采用仍然有限。在现有的分析方法中,红外(IR)光谱在可承受性、简单性和可及性方面表现突出。然而,由于大多数峰超出了人类的解读能力,其应用仅限于识别少数几个官能团。我们提出了一种变压器模型,使化学家能够利用红外光谱中包含的完整信息直接预测分子结构。为了覆盖广阔的化学空间,我们使用634,585个模拟红外光谱对模型进行预训练,并在3,453个实验光谱上对其进行微调。我们的方法在含有6至13个重原子的化合物上实现了44.4%的top-1准确率和69.8%的top-10准确率。当仅预测骨架时,该模型在84.5%的情况下准确预测top-1骨架,在93.0%的情况下准确预测top-10骨架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66a/11569215/a562000261fe/42004_2024_1341_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验