Chen Hui, Tan Chao, Lin Zan
Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, Yibin, Sichuan 644000, China; Hospital, Yibin University, Yibin, Sichuan 644000, China.
Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, Yibin, Sichuan 644000, China.
Spectrochim Acta A Mol Biomol Spectrosc. 2024 Jan 5;304:123315. doi: 10.1016/j.saa.2023.123315. Epub 2023 Sep 1.
Ginseng is a well-known traditional herbal medicine and the ginseng available on the market may not actually be produced in a certain place as claimed. Traditional methods of identifying the geographical origin of Ginseng are subjective, time-consuming or destructive. A more efficient approach is desirable. The feasibility of combining near-infrared (NIR) spectroscopy with ensemble learning for discriminating ginseng producing area was explored. A total of 270 samples were collected and evenly partitioned into the training and test sets. Random subspace ensemble (RSE) that uses linear discriminant classifier (LDA) as weak learner (abbreviated RSE-LDA) was used to construct predictive models. Two parameters including the size of subspace and the number of learners in ensemble were optimized. Classic partial least algorithm (PLS) was applied to build the reference model. The sensitivity, specificity, and total accuracy of final RSE-LDA and PLS models were 97.8 %, 100 %, 99.3 %, and 93.3 %, 96.7 %, 95.6 %, respectively. In order to study the impact of training set composition on the results, the samples were randomly divided 200 times and the algorithm was run repeatedly to statistically analyze the sensitivity and specificity on the test set. Similar results were obtained. The effect of training set size was also investigated. It indicates that the combination of NIR spectroscopy with the RSE algorithm is a potential tool of discriminating the origin of Ginseng.
人参是一种著名的传统草药,市场上销售的人参可能并非如其所宣称的那样产自某地。传统的人参产地鉴别方法主观、耗时或具有破坏性。因此需要一种更有效的方法。本文探讨了将近红外(NIR)光谱与集成学习相结合用于鉴别人参产地的可行性。共收集了270个样本,并将其均匀地划分为训练集和测试集。使用以线性判别分类器(LDA)作为弱学习器的随机子空间集成(RSE)(简称为RSE-LDA)来构建预测模型。对包括子空间大小和集成中学习器数量在内的两个参数进行了优化。应用经典的偏最小二乘法(PLS)构建参考模型。最终RSE-LDA和PLS模型的灵敏度、特异性和总准确率分别为97.8%、100%、99.3%和93.3%、96.7%、95.6%。为了研究训练集组成对结果的影响,将样本随机划分200次并重复运行算法,以对测试集的灵敏度和特异性进行统计分析。得到了相似的结果。还研究了训练集大小的影响。结果表明,近红外光谱与RSE算法相结合是鉴别人参产地的一种潜在工具。