Sun Jiazheng, Xu Xuefang, Feng Songsong, Zhang Hanyu, Xu Lingfeng, Jiang Hong, Sun Baibing, Meng Yuyan, Chen Weizhou
College of Criminal Investigation, People's Public Security University of China, Beijing, 100038, PR China.
State Key Laboratory of Communicable Disease Prevention and Control, Institute for Communicable Disease Prevention and Control, Chinese Center for Disease Control and Prevention, Beijing, 102206, PR China.
Talanta. 2023 Feb 1;253:123807. doi: 10.1016/j.talanta.2022.123807. Epub 2022 Sep 8.
A widespread and escalating public health problem worldwide is foodborne illness, and foodborne Salmonella infection is one of the most common causes of human illness.For the three most pathogenic Salmonella serotypes, Raman spectroscopy was employed to acquire spectral data.As machine learning offers high efficiency and accuracy, we have chosen the convolutional neural network(CNN), which is suitable for solving multi-classification problems, to do in-depth mining and analysis of Raman spectral data.To optimize the instrument parameters, we compared three laser wavelengths: 532, 638, and 785 nm.Ultimately, the 532 nm wavelength was chosen as the most effective for detecting Salmonella.A pre-processing step is necessary to remove interference from the background noise of the Raman spectrum.Our study compared the effects of five spectral preprocessing methods, Savitzky-Golay smoothing (SG), Multivariate Scatter Correction (MSC), Standard Normal Variate (SNV), and Hilbert Transform (HT), on the predictive power of CNN models.Accuracy(ACC), Precision, Recall, and F1-score 4 machine learning evaluation indicators are used to evaluate the model performance under different preprocessing methods.In the results, SG combined with SNV was found to be the most accurate spectral pre-processing method for predicting Salmonella serotypes using Raman spectroscopy, achieving an accuracy of 98.7% for the training set and over 98.5% for the test set in CNN model.Pre-processing spectral data using this method yields higher accuracy than other methods.As a conclusion, the results of this study demonstrate that Raman spectroscopy when used in conjunction with a convolutional neural network model enables the rapid identification of three Salmonella serotypes at the single-cell level, and that the model has a great deal of potential for distinguishing between different serotypes of pathogenic bacteria and closely related bacterial species.This is vital to preventing outbreaks of foodborne illness and the spread of foodborne pathogens.
食源性疾病是一个在全球范围内广泛传播且不断升级的公共卫生问题,而食源性沙门氏菌感染是人类疾病最常见的病因之一。对于三种最具致病性的沙门氏菌血清型,采用拉曼光谱法获取光谱数据。由于机器学习具有高效性和准确性,我们选择了适用于解决多分类问题的卷积神经网络(CNN),对拉曼光谱数据进行深入挖掘和分析。为了优化仪器参数,我们比较了三种激光波长:532、638和785纳米。最终,选择532纳米波长作为检测沙门氏菌最有效的波长。对拉曼光谱进行预处理以去除背景噪声的干扰是必要的。我们的研究比较了五种光谱预处理方法,即Savitzky-Golay平滑(SG)、多元散射校正(MSC)、标准正态变量变换(SNV)和希尔伯特变换(HT),对CNN模型预测能力的影响。使用准确率(ACC)、精确率、召回率和F1分数这4个机器学习评估指标来评估不同预处理方法下的模型性能。结果发现,SG与SNV相结合是使用拉曼光谱预测沙门氏菌血清型最准确的光谱预处理方法,在CNN模型中,训练集的准确率达到98.7%,测试集的准确率超过98.5%。使用这种方法预处理光谱数据比其他方法具有更高的准确率。总之,本研究结果表明,拉曼光谱与卷积神经网络模型结合使用能够在单细胞水平快速鉴定三种沙门氏菌血清型,并且该模型在区分不同血清型的病原菌和密切相关的细菌物种方面具有很大潜力。这对于预防食源性疾病的爆发和食源性病原体的传播至关重要。