基于预测属性排序变量的偏最小二乘建模中的变量减少改进和偏最小二乘复杂度的自适应。

Improved variable reduction in partial least squares modelling based on predictive-property-ranked variables and adaptation of partial least squares complexity.

机构信息

Department of Life Sciences, Avans Hogeschool, University of Professional Education, P.O. Box 90116, 4800 RA Breda, The Netherlands.

出版信息

Anal Chim Acta. 2011 Oct 31;705(1-2):292-305. doi: 10.1016/j.aca.2011.06.037. Epub 2011 Jun 29.

DOI:10.1016/j.aca.2011.06.037

PMID:21962372

Abstract

The calibration performance of partial least squares for one response variable (PLS1) can be improved by elimination of uninformative variables. Many methods are based on so-called predictive variable properties, which are functions of various PLS-model parameters, and which may change during the variable reduction process. In these methods variable reduction is made on the variables ranked in descending order for a given variable property. The methods start with full spectrum modelling. Iteratively, until a specified number of remaining variables is reached, the variable with the smallest property value is eliminated; a new PLS model is calculated, followed by a renewed ranking of the variables. The Stepwise Variable Reduction methods using Predictive-Property-Ranked Variables are denoted as SVR-PPRV. In the existing SVR-PPRV methods the PLS model complexity is kept constant during the variable reduction process. In this study, three new SVR-PPRV methods are proposed, in which a possibility for decreasing the PLS model complexity during the variable reduction process is build in. Therefore we denote our methods as PPRVR-CAM methods (Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models). The selective and predictive abilities of the new methods are investigated and tested, using the absolute PLS regression coefficients as predictive property. They were compared with two modifications of existing SVR-PPRV methods (with constant PLS model complexity) and with two reference methods: uninformative variable elimination followed by either a genetic algorithm for PLS (UVE-GA-PLS) or an interval PLS (UVE-iPLS). The performance of the methods is investigated in conjunction with two data sets from near-infrared sources (NIR) and one simulated set. The selective and predictive performances of the variable reduction methods are compared statistically using the Wilcoxon signed rank test. The three newly developed PPRVR-CAM methods were able to retain significantly smaller numbers of informative variables than the existing SVR-PPRV, UVE-GA-PLS and UVE-iPLS methods without loss of prediction ability. Contrary to UVE-GA-PLS and UVE-iPLS, there is no variability in the number of retained variables in each PPRV(R) method. Renewed variable ranking, after deletion of a variable, followed by remodelling, combined with the possibility to decrease the PLS model complexity, is beneficial. A preferred PPRVR-CAM method is proposed.

摘要

偏最小二乘法（PLS）对单响应变量的校准性能可以通过消除无信息变量来提高。许多方法都是基于所谓的预测变量属性，这些属性是各种 PLS 模型参数的函数，并且在变量减少过程中可能会发生变化。在这些方法中，根据给定的变量属性，按降序对变量进行排序，然后进行变量减少。这些方法从全谱建模开始。迭代地，直到达到指定数量的剩余变量，删除具有最小属性值的变量；然后计算新的 PLS 模型，并重新对变量进行排序。使用预测变量属性排序变量的逐步变量减少方法表示为 SVR-PPRV。在现有的 SVR-PPRV 方法中，在变量减少过程中保持 PLS 模型的复杂性不变。在这项研究中，提出了三种新的 SVR-PPRV 方法，其中在变量减少过程中构建了降低 PLS 模型复杂性的可能性。因此，我们将这些方法称为 PPRVR-CAM 方法（使用适应性模型的预测变量属性排序变量减少）。使用绝对 PLS 回归系数作为预测属性，研究和测试了新方法的选择性和预测能力。将它们与两种现有的 SVR-PPRV 方法（具有恒定的 PLS 模型复杂性）以及两种参考方法进行了比较：无信息变量消除后，要么使用遗传算法进行 PLS（UVE-GA-PLS），要么使用区间 PLS（UVE-iPLS）。结合近红外源（NIR）的两个数据集和一个模拟数据集，研究了方法的性能。使用 Wilcoxon 符号秩检验统计比较了变量减少方法的选择性和预测性能。与现有的 SVR-PPRV、UVE-GA-PLS 和 UVE-iPLS 方法相比，新开发的三种 PPRVR-CAM 方法能够在不损失预测能力的情况下，保留显著较少数量的信息变量。与 UVE-GA-PLS 和 UVE-iPLS 不同，每种 PPRV（R）方法中保留的变量数量没有变化。删除变量后重新排序变量，然后重新建模，同时结合降低 PLS 模型复杂性的可能性，是有益的。提出了一种首选的 PPRVR-CAM 方法。