Hu Jiaqi, Chen Gina Jinna, Xue Chenlong, Liang Pei, Xiang Yanqun, Zhang Chuanlun, Chi Xiaokeng, Liu Guoying, Ye Yanfang, Cui Dongyu, Zhang De, Yu Xiaojun, Dang Hong, Zhang Wen, Chen Junfan, Tang Quan, Guo Penglai, Ho Ho-Pui, Li Yuchao, Cong Longqing, Shum Perry Ping
State Key Laboratory of Optical Fiber and Cable Manufacture Technology, Guangdong Key Laboratory of Integrated Optoelectronics Intellisense, Department of EEE, Southern University of Science and Technology, Shenzhen, 518055, China.
College of Optical and Electronic Technology, China Jiliang University, Hangzhou, 310018, China.
Light Sci Appl. 2024 Feb 20;13(1):52. doi: 10.1038/s41377-024-01394-5.
Raman spectroscopy has tremendous potential for material analysis with its molecular fingerprinting capability in many branches of science and technology. It is also an emerging omics technique for metabolic profiling to shape precision medicine. However, precisely attributing vibration peaks coupled with specific environmental, instrumental, and specimen noise is problematic. Intelligent Raman spectral preprocessing to remove statistical bias noise and sample-related errors should provide a powerful tool for valuable information extraction. Here, we propose a novel Raman spectral preprocessing scheme based on self-supervised learning (RSPSSL) with high capacity and spectral fidelity. It can preprocess arbitrary Raman spectra without further training at a speed of ~1 900 spectra per second without human interference. The experimental data preprocessing trial demonstrated its excellent capacity and signal fidelity with an 88% reduction in root mean square error and a 60% reduction in infinite norm ([Formula: see text]) compared to established techniques. With this advantage, it remarkably enhanced various biomedical applications with a 400% accuracy elevation (ΔAUC) in cancer diagnosis, an average 38% (few-shot) and 242% accuracy improvement in paraquat concentration prediction, and unsealed the chemical resolution of biomedical hyperspectral images, especially in the spectral fingerprint region. It precisely preprocessed various Raman spectra from different spectroscopy devices, laboratories, and diverse applications. This scheme will enable biomedical mechanism screening with the label-free volumetric molecular imaging tool on organism and disease metabolomics profiling with a scenario of high throughput, cross-device, various analyte complexity, and diverse applications.
拉曼光谱凭借其在众多科学技术领域的分子指纹识别能力,在材料分析方面具有巨大潜力。它也是一种新兴的组学技术,用于代谢谱分析以推动精准医学发展。然而,将振动峰与特定的环境、仪器和样本噪声精确关联存在问题。智能拉曼光谱预处理以去除统计偏差噪声和与样本相关的误差,应为有价值信息提取提供强大工具。在此,我们提出一种基于自监督学习的新型拉曼光谱预处理方案(RSPSSL),具有高容量和光谱保真度。它可以在无人干预的情况下,以每秒约1900个光谱的速度对任意拉曼光谱进行预处理,无需进一步训练。实验数据预处理试验表明,与现有技术相比,其具有出色的容量和信号保真度,均方根误差降低了88%,无穷范数([公式:见原文])降低了60%。凭借这一优势,它显著增强了各种生物医学应用,在癌症诊断中准确率提高了400%(ΔAUC),在百草枯浓度预测中平均提高了38%(少样本)和242%的准确率,并揭示了生物医学高光谱图像的化学分辨率,尤其是在光谱指纹区域。它精确地预处理了来自不同光谱设备、实验室和各种应用的各种拉曼光谱。该方案将通过无标记的体积分子成像工具实现生物医学机制筛选,并通过高通量、跨设备、各种分析物复杂性和多样应用的场景进行生物体和疾病代谢组学分析。