Zhao Yaju, Lv Wei, Zhang Yinsheng, Tang Minmin, Wang Haiyan
Zhejiang Engineering Research Institute of Food & Drug Quality and Safety, Zhejiang Gongshang University, Hangzhou 310018, PR China.
Zhejiang Engineering Research Institute of Food & Drug Quality and Safety, Zhejiang Gongshang University, Hangzhou 310018, PR China.
Spectrochim Acta A Mol Biomol Spectrosc. 2024 Dec 15;323:124913. doi: 10.1016/j.saa.2024.124913. Epub 2024 Jul 31.
In this study, a simple and accurate approach is proposed for enhancing the origin identification of raspberry samples using a combination of innovative Raman spectral preprocessing techniques, feature selection, and machine learning algorithms. Window function was creatively introduced and combined with baseline removal technique to preprocess the Raman spectral data, reducing the dimensionality of the raw data and ensuring the quality of the processed data. An optimization process was conducted to determine the optimal parameter for the window function, resulting in a binning window width of 5 that yielded the highest accuracy. After applying three feature selection techniques, it was found that the information gain model had the best performance in extracting discriminative spectral features. Finally, ten different machine learning algorithms were employed to construct predictive models, and the optimal models were selected. Linear Support Vector Classifier (LinearSVC), Multi-Layer Perceptron Classifier (MLPClassifier), and Linear Discriminant Analysis (LDA) achieve accuracy, precision, recall, and F1 values above 0.96, while the Random Vector Functional Link Network Classifier (RVFLClassifier) surpasses 0.93 for these performance metrics. These results demonstrate the effectiveness of the proposed approach in identifying the origin of raspberry samples with high accuracy and robustness, providing a valuable tool for agricultural product authentication and quality control.
在本研究中,提出了一种简单且准确的方法,通过结合创新的拉曼光谱预处理技术、特征选择和机器学习算法来提高覆盆子样品产地识别的准确性。创造性地引入窗函数并将其与基线去除技术相结合,对拉曼光谱数据进行预处理,降低原始数据的维度并确保处理后数据的质量。进行了优化过程以确定窗函数的最佳参数,得出分箱窗宽为5时准确率最高。应用三种特征选择技术后,发现信息增益模型在提取有判别力的光谱特征方面表现最佳。最后,采用十种不同的机器学习算法构建预测模型,并选择了最优模型。线性支持向量分类器(LinearSVC)、多层感知器分类器(MLPClassifier)和线性判别分析(LDA)的准确率、精确率、召回率和F1值均高于0.96,而随机向量功能链接网络分类器(RVFLClassifier)在这些性能指标上超过0.93。这些结果证明了所提出方法在高精度和稳健性地识别覆盆子样品产地方面的有效性,为农产品认证和质量控制提供了有价值的工具。