School of Pharmacy, Xinjiang Medical University, Xinyi Road, Urumqi 830011, China.
Key Laboratory of Active Components of Xinjiang Natural Medicine and Drug Release Technology, Xinyi Road, Urumqi 830011, China.
J AOAC Int. 2023 Jul 17;106(4):1118-1125. doi: 10.1093/jaoacint/qsac144.
Cistanche tubulosa, as a homology of medicine and food, not only has a unique medicinal value but also is widely used in healthcare products. Polysaccharide is one of its important quality indicators.
In this study, an analytical model based on near-infrared (NIR) spectroscopy combined with machine learning was established to predict the polysaccharide content of C. tubulosa.
The polysaccharide content in the samples determined by the phenol-sulfuric acid method was used as a reference value, and machine learning was applied to relate the spectral information to the reference value. Dividing the samples into a calibration set and a prediction set using the Kennard-Stone algorithm. The model was optimized by various preprocessing methods, including Savitzky-Golay (SG), standard normal variate (SNV), multiple scattering correction (MSC), first-order derivative (FD), second-order derivative (SD), and combinations of them. Variable selection was performed through the successive projections algorithm (SPA) and stability competitive adaptive reweighted sampling (sCARS). Four machine learning models were used to build quantitative models, including the random forest (RF), partial least-squares (PLS), principal component regression (PCR), and support vector machine (SVM). The evaluation indexes of the model were the coefficient of determination (R2), root-mean-square error (RMSE), and residual prediction deviation (RPD).
RF performs best among the four machine learning models. R2c (calibration set coefficient of determination) and RMSEC (root mean square error of the calibration set), %, were 0.9763. and 0.3527 for calibration, respectively. R2p (prediction set coefficient of determination), RMSEP (root mean square error of the prediction set), %, and RPD were 0.9230, 0.5130, and 3.33 for prediction, respectively.
The results indicate that NIR combined with the RF is an effective method applied to the quality evaluation of the polysaccharides of C. tubulosa.
Four quantitative models were developed to predict the polysaccharide content in C. tubulosa, and good results were obtained. The characteristic variables were basically determined by the sCARS algorithm, and the corresponding characteristic groups were analyzed.
肉苁蓉作为一种药食同源的植物,不仅具有独特的药用价值,而且在保健品中得到了广泛的应用。多糖是其重要的质量指标之一。
本研究建立了基于近红外(NIR)光谱结合机器学习的分析模型,以预测肉苁蓉的多糖含量。
采用苯酚-硫酸法测定样品的多糖含量作为参考值,应用机器学习将光谱信息与参考值相关联。采用 Kennard-Stone 算法将样品分为校准集和预测集。通过 Savitzky-Golay(SG)、标准正态变量(SNV)、多次散射校正(MSC)、一阶导数(FD)、二阶导数(SD)及其组合等多种预处理方法对模型进行优化。通过连续投影算法(SPA)和稳定竞争自适应重加权采样(sCARS)进行变量选择。采用随机森林(RF)、偏最小二乘(PLS)、主成分回归(PCR)和支持向量机(SVM)四种机器学习模型建立定量模型。模型的评价指标为决定系数(R2)、均方根误差(RMSE)和残差预测偏差(RPD)。
在四种机器学习模型中,RF 表现最佳。校准集的决定系数(R2c)和 RMSEC(校准集均方根误差)分别为 0.9763 和 0.3527。预测集的决定系数(R2p)、RMSEP(预测集均方根误差)和 RPD 分别为 0.9230、0.5130 和 3.33。
结果表明,NIR 结合 RF 是一种应用于肉苁蓉多糖质量评价的有效方法。
建立了四种定量模型来预测肉苁蓉的多糖含量,取得了较好的结果。特征变量基本由 sCARS 算法确定,并对相应的特征组进行了分析。