Sun Xue, Zhang Deng-Ting, Wang Hui, Zhou Cong, Yang Jian, Peng Dai-Yin, Zhang Xiao-Bo
State Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs,National Resource Center for Chinese Materia Medica,China Academy of Chinese Medical Sciences Beijing 100700,China.
State Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs,National Resource Center for Chinese Materia Medica,China Academy of Chinese Medical Sciences Beijing 100700,China School of Pharmacy,Anhui University of Chinese Medicine Hefei 230012,China.
Zhongguo Zhong Yao Za Zhi. 2023 Aug;48(16):4337-4346. doi: 10.19540/j.cnki.cjcmm.20230512.102.
To realize the non-destructive and rapid origin discrimination of Poria cocos in batches, this study established the P. cocos origin recognition model based on hyperspectral imaging combined with machine learning. P. cocos samples from Anhui, Fujian, Guangxi, Hubei, Hunan, Henan and Yunnan were used as the research objects. Hyperspectral data were collected in the visible and near infrared band(V-band, 410-990 nm) and shortwave infrared band(S-band, 950-2 500 nm). The original spectral data were divided into S-band, V-band and full-band. With the original data(RD) of different bands, multiplicative scatter correction(MSC), standard normal variation(SNV), S-G smoothing(SGS), first derivative(FD), second derivative(SD) and other pretreatments were carried out. Then the data were classified according to three different types of producing areas: province, county and batch. The origin identification model was established by partial least squares discriminant analysis(PLS-DA) and linear support vector machine(LinearSVC). Finally, confusion matrix was employed to evaluate the optimal model, with F1 score as the evaluation standard. The results revealed that the origin identification model established by FD combined with LinearSVC had the highest prediction accuracy in full-band range classified by province, V-band range by county and full-band range by batch, which were 99.28%, 98.55% and 97.45%, respectively, and the overall F1 scores of these three models were 99.16%, 98.59% and 97.58%, respectively, indicating excellent performance of these models. Therefore, hyperspectral imaging combined with LinearSVC can realize the non-destructive, accurate and rapid identification of P. cocos from different producing areas in batches, which is conducive to the directional research and production of P. cocos.
为实现茯苓批量无损快速产地鉴别,本研究建立了基于高光谱成像结合机器学习的茯苓产地识别模型。以安徽、福建、广西、湖北、湖南、河南和云南的茯苓样本为研究对象。在可见近红外波段(V波段,410 - 990 nm)和短波红外波段(S波段,950 - 2500 nm)采集高光谱数据。原始光谱数据分为S波段、V波段和全波段。对不同波段的原始数据(RD)进行多元散射校正(MSC)、标准正态变量变换(SNV)、Savitzky - Golay平滑(SGS)、一阶导数(FD)、二阶导数(SD)等预处理。然后根据产地的三种不同类型:省份、县和批次对数据进行分类。采用偏最小二乘判别分析(PLS - DA)和线性支持向量机(LinearSVC)建立产地鉴别模型。最后利用混淆矩阵评估最优模型,以F1分数作为评估标准。结果表明,FD结合LinearSVC建立的产地鉴别模型在按省份分类全波段范围、按县分类V波段范围和按批次分类全波段范围的预测准确率最高,分别为99.28%、98.55%和97.45%,这三个模型的总体F1分数分别为99.16%、98.59%和97.58%,表明这些模型性能优异。因此,高光谱成像结合LinearSVC可以实现对不同产地茯苓的批量无损、准确且快速鉴别,有利于茯苓的定向研究与生产。