Key Laboratory of Electronics Engineering, Heilongjiang University, Harbin, College of Heilongjiang Province, China; Room 503, Building A8, Heilongjiang University, No. 74, Xuefu road, Nangang District, Harbin 150080, China.
Key Laboratory of Electronics Engineering, Heilongjiang University, Harbin, College of Heilongjiang Province, China; Room 503, Building A8, Heilongjiang University, No. 74, Xuefu road, Nangang District, Harbin 150080, China.
Spectrochim Acta A Mol Biomol Spectrosc. 2018 Jan 15;189:463-472. doi: 10.1016/j.saa.2017.08.055. Epub 2017 Aug 20.
A steady and accurate model used for quality detection depends on precise data and appropriate analytical methods. In this study, the authors applied partial least square regression (PLSR) to construct a model based on the spectral data measured to predict the protein content in wheat, and proposed a new method, global search method, to select PLSR components. In order to select representative and universal samples for modeling, Monte Carlo cross validation (MCCV) was proposed as a tool to detect outliers, and identified 4 outlier samples. Additionally, improved simulated annealing (ISA) combined with PLSR was employed to select most effective variables from spectral data, the data's dimensionality reduced from 100 to 57, and the standard error of prediction (SEP) decreased from 0.0716 to 0.0565 for prediction set, as well as the correlation coefficients (R) between the predicted and actual protein content of wheat increased from 0.9989 to 0.9994. In order to reduce the dimensionality of the data further, successive projections algorithm (SPA) was then used, the combination of these two methods was called ISA-SPA. The results indicated that calibration model built using ISA-SPA on 14 effective variables achieved the optimal performance for prediction of protein content in wheat comparing with other developed PLSR models (ISA or SPA) by comprehensively considering the accuracy, robustness, and complexity of models. The coefficient of determination increased to 0.9986 and the SEP decreased to 0.0528, respectively.
一个稳定且准确的质量检测模型依赖于精确的数据和适当的分析方法。在本研究中,作者应用偏最小二乘回归(PLSR)基于所测光谱数据构建模型来预测小麦中的蛋白质含量,并提出了一种新的方法,即全局搜索法,用于选择 PLSR 成分。为了选择具有代表性和通用性的建模样本,提出了蒙特卡罗交叉验证(MCCV)作为检测异常值的工具,共检测到 4 个异常样本。此外,采用改进的模拟退火(ISA)与 PLSR 相结合的方法从光谱数据中选择最有效的变量,数据的维数从 100 减少到 57,预测集的预测标准误差(SEP)从 0.0716 降低到 0.0565,以及小麦预测和实际蛋白质含量之间的相关系数(R)从 0.9989 增加到 0.9994。为了进一步降低数据的维数,然后使用连续投影算法(SPA),这两种方法的组合称为 ISA-SPA。结果表明,与其他开发的 PLSR 模型(ISA 或 SPA)相比,使用 ISA-SPA 对 14 个有效变量构建的校准模型在综合考虑模型的准确性、稳健性和复杂性方面,对小麦蛋白质含量的预测具有最佳性能。决定系数增加到 0.9986,SEP 降低到 0.0528。