Yao Haiming, Luo Wei, Gao Ang, Zhou Tao, Wang Xue
State Key Laboratory of Precision Measurement Technology and Instruments, Tsinghua University, Beijing, 100084, China.
Anal Chim Acta. 2025 Oct 22;1372:344372. doi: 10.1016/j.aca.2025.344372. Epub 2025 Jul 22.
Raman spectroscopy has attracted significant attention in various biochemical detection fields, especially in the rapid identification of pathogenic bacteria. The integration of this technology with deep learning to facilitate automated bacterial Raman spectroscopy diagnosis has emerged as a key focus in recent research. However, the diagnostic performance of existing deep learning methods largely depends on a sufficient dataset, and in scenarios where there is a limited availability of Raman spectroscopy data, it is inadequate to fully optimize the numerous parameters of deep neural networks. To address these challenges, this paper proposes a data generation method utilizing deep generative models to expand the data volume and enhance the recognition accuracy of bacterial Raman spectra. Specifically, we introduce DiffRaman, a conditional latent denoising diffusion probability model for Raman spectra generation. Our approach begins with applying a two-dimensional figure transformation to the Raman spectral data. Following this, we utilize the encoder of a Vector Quantized Variational Autoencoder (VQ-VAE) to compress the Raman image into a lower-dimensional latent space. We then construct a Conditional Denoising Diffusion Probabilistic Model (DDPM) for representation learning and data augmentation. Ultimately, the decoder of the VQ-VAE is employed to reconstruct the spectrum from its low-dimensional latent representation. Experimental results show that DiffRaman-generated synthetic bacterial Raman spectra can effectively mimic real spectra, improving diagnostic model performance, particularly in data-limited settings. Compared to existing models, DiffRaman enhances generation quality and computational efficiency. Our DiffRaman approach offers a well-suited solution for automated bacteria Raman spectroscopy diagnosis in data-scarce scenarios, offering new insights into alleviating the labor of spectroscopic measurements and enhancing rare bacteria identification.
拉曼光谱在各种生化检测领域引起了广泛关注,尤其是在病原菌的快速鉴定方面。将该技术与深度学习相结合以促进细菌拉曼光谱的自动化诊断已成为近期研究的重点。然而,现有深度学习方法的诊断性能在很大程度上依赖于充足的数据集,并且在拉曼光谱数据可用性有限的情况下,要充分优化深度神经网络的众多参数是不够的。为应对这些挑战,本文提出一种利用深度生成模型的数据生成方法,以扩大数据量并提高细菌拉曼光谱的识别准确率。具体而言,我们引入了DiffRaman,这是一种用于拉曼光谱生成的条件潜在去噪扩散概率模型。我们的方法首先对拉曼光谱数据进行二维图形变换。在此之后,我们利用矢量量化变分自编码器(VQ-VAE)的编码器将拉曼图像压缩到低维潜在空间。然后,我们构建一个条件去噪扩散概率模型(DDPM)用于表示学习和数据增强。最终,使用VQ-VAE的解码器从其低维潜在表示中重建光谱。实验结果表明,DiffRaman生成的合成细菌拉曼光谱能够有效模拟真实光谱,提高诊断模型性能,特别是在数据有限的情况下。与现有模型相比,DiffRaman提高了生成质量和计算效率。我们的DiffRaman方法为数据稀缺场景下的细菌拉曼光谱自动化诊断提供了一个合适的解决方案,为减轻光谱测量的工作量和提高稀有细菌的识别能力提供了新的思路。