The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China.
Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China.
J Chem Inf Model. 2023 Aug 14;63(15):4615-4622. doi: 10.1021/acs.jcim.3c00749. Epub 2023 Aug 2.
Infrared (IR) spectroscopy is a powerful and versatile tool for analyzing functional groups in organic compounds. A complex and time-consuming interpretation of massive unknown spectra usually requires knowledge of chemistry and spectroscopy. This paper presents a new deep learning method for transforming IR spectral features into intuitive imagelike feature maps and prediction of major functional groups. We obtained 8272 gas-phase IR spectra from the NIST Chemistry WebBook. Feature maps are constructed using the intrinsic correlation of spectral data, and prediction models are developed based on convolutional neural networks. Twenty-one major functional groups for each molecule are successfully identified using binary and multilabel models without expert guidance and feature selection. The multilabel classification model can produce all prediction results simultaneously for rapid characterization. Further analysis of the detailed substructures indicates that our model is capable of obtaining abundant structural information from IR spectra for a comprehensive investigation. The interpretation of our model reveals that the peaks of most interest are similar to those often considered by spectroscopists. In addition to demonstrating great potential for spectral identification, our method may contribute to the development of automated analyses in many fields.
红外(IR)光谱学是分析有机化合物官能团的强大而通用的工具。大量未知光谱的复杂且耗时的解释通常需要化学和光谱学知识。本文提出了一种新的深度学习方法,用于将 IR 光谱特征转换为直观的图像特征图,并预测主要官能团。我们从 NIST Chemistry WebBook 获得了 8272 种气相 IR 光谱。使用光谱数据的固有相关性构建特征图,并基于卷积神经网络开发预测模型。在没有专家指导和特征选择的情况下,成功使用二进制和多标签模型识别每个分子的 21 种主要官能团。多标签分类模型可以同时生成所有预测结果,以实现快速特征描述。对详细子结构的进一步分析表明,我们的模型能够从 IR 光谱中获取丰富的结构信息,以进行全面研究。对我们模型的解释表明,最感兴趣的峰与光谱学家通常考虑的峰相似。除了展示在光谱识别方面的巨大潜力外,我们的方法还可能有助于许多领域的自动化分析的发展。