Li Bo, Schmidt Mikkel N, Alstrøm Tommy S
Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Lyngby, Denmark.
Analyst. 2022 May 17;147(10):2238-2246. doi: 10.1039/d2an00403h.
Raman spectroscopy is an important, low-cost, non-intrusive technique often used for chemical identification. Typical approaches identify a spectrum by comparing it with a reference database using supervised machine learning, which usually requires careful preprocessing and multiple spectra available per analyte. We propose a new machine learning technique for spectrum identification using contrastive representation learning. Our approach requires no preprocessing and works with as little as a single reference spectrum per analyte. We have significantly improved or are on par with the existing state-of-the-art analyte identification accuracy on two Raman spectral datasets and one SERS dataset that include a single component. We demonstrate that the identification accuracy can be further increased by slightly increasing the candidate set size using conformal prediction on the SERS dataset. Based on our findings, we believe contrastive representation learning is a promising alternative to the existing methods for Raman spectrum matching.
拉曼光谱是一种重要的、低成本的、非侵入性技术,常用于化学识别。典型方法通过使用监督机器学习将光谱与参考数据库进行比较来识别光谱,这通常需要仔细的预处理以及每个分析物有多个可用光谱。我们提出了一种使用对比表示学习的光谱识别新机器学习技术。我们的方法无需预处理,每个分析物只需一条参考光谱即可工作。在两个拉曼光谱数据集和一个包含单一组分的表面增强拉曼光谱(SERS)数据集上,我们显著提高了现有单一组分分析物识别准确率,或与之相当。我们证明,在SERS数据集上使用共形预测稍微增加候选集大小,可以进一步提高识别准确率。基于我们的发现,我们相信对比表示学习是现有拉曼光谱匹配方法的一种有前途的替代方法。