Tang Rongnian, Chen Xupeng, Li Chuang
School of Mechanical and Electrical Engineering, Hainan University, Haikou, China.
Appl Spectrosc. 2018 May;72(5):740-749. doi: 10.1177/0003702818755142. Epub 2018 Apr 4.
Near-infrared spectroscopy is an efficient, low-cost technology that has potential as an accurate method in detecting the nitrogen content of natural rubber leaves. Successive projections algorithm (SPA) is a widely used variable selection method for multivariate calibration, which uses projection operations to select a variable subset with minimum multi-collinearity. However, due to the fluctuation of correlation between variables, high collinearity may still exist in non-adjacent variables of subset obtained by basic SPA. Based on analysis to the correlation matrix of the spectra data, this paper proposed a correlation-based SPA (CB-SPA) to apply the successive projections algorithm in regions with consistent correlation. The result shows that CB-SPA can select variable subsets with more valuable variables and less multi-collinearity. Meanwhile, models established by the CB-SPA subset outperform basic SPA subsets in predicting nitrogen content in terms of both cross-validation and external prediction. Moreover, CB-SPA is assured to be more efficient, for the time cost in its selection procedure is one-twelfth that of the basic SPA.
近红外光谱技术是一种高效、低成本的技术,有潜力成为检测天然橡胶树叶氮含量的精确方法。连续投影算法(SPA)是一种广泛用于多元校准的变量选择方法,它通过投影操作来选择具有最小多重共线性的变量子集。然而,由于变量之间相关性的波动,通过基本SPA获得的子集中非相邻变量之间可能仍然存在高共线性。基于对光谱数据相关矩阵的分析,本文提出了一种基于相关性的SPA(CB-SPA),以便在相关性一致的区域应用连续投影算法。结果表明,CB-SPA可以选择具有更有价值变量和更少多重共线性的变量子集。同时,由CB-SPA子集建立的模型在交叉验证和外部预测方面预测氮含量的表现均优于基本SPA子集。此外,CB-SPA的效率更高,因为其选择过程中的时间成本仅为基本SPA的十二分之一。