Radboud University Nijmegen, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands.
Anal Chim Acta. 2011 Oct 31;705(1-2):123-34. doi: 10.1016/j.aca.2011.04.025. Epub 2011 Apr 22.
Kernel partial least squares (KPLS) and support vector regression (SVR) have become popular techniques for regression of complex non-linear data sets. The modeling is performed by mapping the data in a higher dimensional feature space through the kernel transformation. The disadvantage of such a transformation is, however, that information about the contribution of the original variables in the regression is lost. In this paper we introduce a method which can retrieve and visualize the contribution of the variables to the regression model and the way the variables contribute to the regression of complex data sets. The method is based on the visualization of trajectories using so-called pseudo samples representing the original variables in the data. We test and illustrate the proposed method to several synthetic and real benchmark data sets. The results show that for linear and non-linear regression models the important variables were identified with corresponding linear or non-linear trajectories. The results were verified by comparing with ordinary PLS regression and by selecting those variables which were indicated as important and rebuilding a model with only those variables.
核偏最小二乘 (KPLS) 和支持向量回归 (SVR) 已成为复杂非线性数据集回归的流行技术。通过核变换将数据映射到更高维的特征空间来进行建模。然而,这种变换的缺点是,关于原始变量在回归中的贡献的信息丢失了。在本文中,我们介绍了一种可以检索和可视化变量对回归模型的贡献以及变量对复杂数据集回归的贡献方式的方法。该方法基于使用所谓的伪样本对轨迹进行可视化,伪样本代表数据中的原始变量。我们使用几个合成和真实基准数据集对所提出的方法进行了测试和说明。结果表明,对于线性和非线性回归模型,可以用相应的线性或非线性轨迹来识别重要变量。通过与普通偏最小二乘回归进行比较,并选择那些被指示为重要的变量,并仅使用这些变量重建一个模型,验证了结果。