Public Foundational Courses Department, Nanjing Vocational University of Industry Technology, Nanjing, China.
Research and Development Department, Nanjing Changxingyang Intelligent Home Company Limited, Nanjing, China.
PLoS One. 2023 Feb 28;18(2):e0282429. doi: 10.1371/journal.pone.0282429. eCollection 2023.
Infrared spectroscopy can quickly and non-destructively extract analytical information from samples. It can be applied to the authenticity identification of various Chinese herbal medicines, the prediction of the mixing amount of defective products, and the analysis of the origin. In this paper, the spectral information of Cornus officinalis from 11 origins was used as the research object, and the origin identification model of Cornus officinalis based on mid-infrared spectroscopy was established. First, principal component analysis was used to extract the absorbance data of Cornus officinalis in the wavenumber range of 551~3998 cm-1. The extracted principal components contain more than 99.8% of the information of the original data. Second, the extracted principal component information was used as input, and the origin category was used as output, and the origin identification model was trained with the help of support vector machine. In this paper, this combined model is called PCA-SVM combined model. Finally, the generalization ability of the PCA-SVM model is evaluated through an external test set. The three indicators of Accuracy, F1-Score, and Kappa coefficient are used to compare this model with other commonly used classification models such as naive Bayes model, decision trees, linear discriminant analysis, radial basis function neural network and partial least square discriminant analysis. The results show that PCA-SVM model is superior to other commonly used models in accuracy, F1 score and Kappa coefficient. In addition, compared with the SVM model with full spectrum data, the PCA-SVM model not only reduces the redundant variables in the model, but also has higher accuracy. Using this model to identify the origin of Cornus officinalis, the accuracy rate is 84.8%.
近红外光谱技术可以快速、无损地从样品中提取分析信息。它可以应用于各种中药材的真伪鉴别、掺伪产品的掺假量预测以及产地分析。本文以 11 个产地的山茱萸为研究对象,建立了基于中红外光谱的山茱萸产地鉴别模型。首先,采用主成分分析提取山茱萸在 551~3998cm-1 波数范围内的吸光度数据,提取的主成分包含原始数据信息的 99.8%以上。其次,将提取的主成分信息作为输入,将产地类别作为输出,利用支持向量机辅助训练产地鉴别模型。本文将这种组合模型称为 PCA-SVM 组合模型。最后,通过外部测试集评价 PCA-SVM 模型的泛化能力。采用准确率、F1 值和 Kappa 系数三个指标,将该模型与朴素贝叶斯模型、决策树、线性判别分析、径向基函数神经网络和偏最小二乘判别分析等常用分类模型进行比较。结果表明,在准确率、F1 值和 Kappa 系数方面,PCA-SVM 模型均优于其他常用模型。此外,与全谱数据的 SVM 模型相比,PCA-SVM 模型不仅减少了模型中的冗余变量,而且具有更高的准确率。使用该模型对山茱萸产地进行鉴别,准确率为 84.8%。