Singh Kunwar P, Malik Amrita, Basant Nikita, Saxena Puneet
Environmental Chemistry Division, Industrial Toxicology Research Centre, Post Box 80, MG Marg, Lucknow 226001, India.
Anal Chim Acta. 2007 Feb 19;584(2):385-96. doi: 10.1016/j.aca.2006.11.038. Epub 2006 Nov 19.
A 10 years surface water quality data set pertaining to a polluted river was analyzed using partial least squares (PLS) regression models. Both the unfold-PLS and N-PLS (tri-PLS and quadri-PLS) models were calibrated through leave-one out cross-validation method. These were applied to the multivariate, multi-way data array with a view to assess and compare their predictive capabilities for biochemical oxygen demand (BOD) of river water in terms of their relative mean squares error of cross-validation, prediction and variance captured. The sum of squares of residuals and leverages were computed and analyzed to identify the sites, variables, years and months which may have influence on the constructed model. Both the tri- and quadri-PLS models yielded relatively low validation error as compared to unfold-PLS and captured high variance in model. Moreover, both of these methods produced acceptable model precision and accuracy. In case of tri-PLS the root mean squares errors were 1.65 and 2.17 for calibration and prediction, respectively; whereas these were 2.58 and 1.09 for quadri-PLS. At a preliminary level it seems that BOD can be predicted but a different data arrangement is needed. Moreover, analysis of the scores and loadings plots of the N-PLS models could provide information on time evolution of the river water quality.
使用偏最小二乘法(PLS)回归模型分析了一个与受污染河流相关的10年地表水水质数据集。展开式PLS模型和N-PLS模型(三向PLS模型和四向PLS模型)均通过留一法交叉验证方法进行校准。将这些模型应用于多元、多向数据阵列,旨在根据交叉验证、预测的相对均方误差以及所捕获的方差,评估和比较它们对河水生化需氧量(BOD)的预测能力。计算并分析了残差平方和与杠杆值,以识别可能对构建模型有影响的地点、变量、年份和月份。与展开式PLS模型相比,三向PLS模型和四向PLS模型产生的验证误差相对较低,且在模型中捕获了较高的方差。此外,这两种方法都产生了可接受的模型精度和准确性。对于三向PLS模型,校准和预测的均方根误差分别为1.65和2.17;而对于四向PLS模型,这些值分别为2.58和1.09。在初步阶段,似乎可以预测BOD,但需要不同的数据排列方式。此外,对N-PLS模型的得分图和载荷图进行分析,可以提供有关河流水质随时间演变的信息。