Wang Li, Wang Qinqin, Wang Yuanzhong, Wang Yunmei
Quality Standards and Testing Technology Research Institute, Yunnan Academy of Agricultural Sciences, Kunming 650205, China.
College of Agronomy and Biotechnology, Yunnan Agricultural University, Kunming 650201, China.
J Anal Methods Chem. 2021 Jul 21;2021:5818999. doi: 10.1155/2021/5818999. eCollection 2021.
Poria originated from the dried sclerotium of is an edible traditional Chinese medicine with high economic value. Due to the significant difference in quality between wild and cultivated , this study aimed to trace the origin of the fungus from the perspectives of wild and cultivation. In addition, there were quite limited studies about data fusion, a potential strategy, employed and discussed in the geographical traceability of . Therefore, we traced the origin of from the perspectives of wild and cultivation using multiple data fusion approaches. Supervised pattern recognition techniques, like partial least squares discriminant analysis (PLS-DA) and random forest, were employed in this study using. Five types of data fusion involving low-, mid-, and high-level data fusion strategies were performed. Two feature extraction approaches including the selecting variables by a random forest-based method-Boruta algorithm and producing principal components by the dimension reduction technique of principal component analysis-were considered in data fusion. The results indicate the following: (1) The difference between wild and cultivated samples did exist in terms of the content analysis of vital chemical components and fingerprint analysis. (2) Wild samples need data fusion to realize the origin traceability, and the accuracy of the validation set was 95.24%. (3) Boruta outperformed principal component analysis (PCA) in feature extraction. (4) The mid-level Boruta PLS-DA model took full advantage of information synergy and showed the best performance. This study proved that both geographical traceability and optimal identification methods of cultivated and wild samples were different, and data fusion was a potential technique in the geographical identification.
茯苓来源于干燥菌核,是一种具有高经济价值的可食用中药。由于野生茯苓和栽培茯苓在品质上存在显著差异,本研究旨在从野生和栽培的角度追踪茯苓的来源。此外,关于数据融合这一潜在策略在茯苓地理溯源中的应用和讨论的研究相当有限。因此,我们使用多种数据融合方法从野生和栽培的角度追踪茯苓的来源。本研究采用了监督模式识别技术,如偏最小二乘判别分析(PLS-DA)和随机森林。进行了五种类型的数据融合,涉及低、中、高层面的数据融合策略。在数据融合中考虑了两种特征提取方法,一种是基于随机森林的方法——Boruta算法选择变量,另一种是通过主成分分析的降维技术生成主成分。结果表明:(1)在重要化学成分的含量分析和指纹图谱分析方面,野生和栽培样品之间确实存在差异。(2)野生样品需要数据融合来实现溯源,验证集的准确率为95.24%。(3)在特征提取方面,Boruta算法优于主成分分析(PCA)。(4)中层的Boruta PLS-DA模型充分利用了信息协同作用,表现出最佳性能。本研究证明,栽培和野生样品的地理溯源及最佳鉴别方法不同,数据融合是地理鉴别的一种潜在技术。