Zareef Muhammad, Arslan Muhammad, Hassan Md Mehedi, Ahmad Waqas, Chen Quansheng
School of Food and Biological Engineering Jiangsu University, Zhenjiang, People's Republic of China.
College of Food and Biological Engineering, Jimei University, Xiamen, People's Republic of China.
J Sci Food Agric. 2023 Dec;103(15):7914-7920. doi: 10.1002/jsfa.12880. Epub 2023 Aug 7.
The objective of the current study was to compare two machine learning approaches for the quantification of total polyphenols by choosing the optimal spectral intervals utilizing the synergy interval partial least squares (Si-PLS) model. To increase the resilience of built models, the genetic algorithm (GA) and competitive adaptive reweighted sampling (CARS) were applied to a subset of variables.
The collected spectral data were divided into 19 sub-interval selections totaling 246 variables, yielding the lowest root mean square error of cross-validation (RMSECV). The performance of the model was evaluated using the correlation coefficient for calibration (R ), prediction (R ), RMSECV, root mean square error of prediction (RMSEP) and residual predictive deviation (RPD) value. The Si-GA-PLS model produced the following results: PCs = 9; R = 0.915; RMSECV = 1.39; R = 0.8878; RMSEP = 1.62; and RPD = 2.32. The performance of the Si-CARS-PLS model was noted to be best at PCs = 10, while R = 0.9723, RMSECV = 0.81, R = 0.9114, RMSEP = 1.45 and RPD = 2.59.
The build model's prediction ability was amended in the order PLS < Si-PLS < CARS-PLS when full spectroscopic data were used and Si-PLS < Si-GA-PLS < Si-CARS-PLS when interval selection was performed with the Si-PLS model. Finally, the developed method was successfully used to quantify total polyphenols in tea. © 2023 Society of Chemical Industry.
本研究的目的是通过选择最佳光谱区间,利用协同区间偏最小二乘法(Si-PLS)模型,比较两种用于总多酚定量的机器学习方法。为提高所构建模型的稳健性,将遗传算法(GA)和竞争性自适应重加权采样(CARS)应用于变量子集。
收集到的光谱数据被分为19个包含246个变量的子区间选择,得到最低的交叉验证均方根误差(RMSECV)。使用校准相关系数(R )、预测相关系数(R )、RMSECV、预测均方根误差(RMSEP)和残差预测偏差(RPD)值评估模型性能。Si-GA-PLS模型得出以下结果:主成分数(PCs)=9;R =0.915;RMSECV=1.39;R =0.8878;RMSEP=1.62;RPD=2.32。Si-CARS-PLS模型在PCs=10时性能最佳,此时R =0.9723,RMSECV=0.81,R =0.9114,RMSEP=1.45,RPD=2.59。
当使用全光谱数据时,所构建模型的预测能力按PLS<Si-PLS<CARS-PLS的顺序得到改善;当使用Si-PLS模型进行区间选择时,预测能力按Si-PLS<Si-GA-PLS<Si-CARS-PLS的顺序得到改善。最后,所开发的方法成功用于茶叶中总多酚的定量。©2023化学工业协会。