Wu Xiao-Li, Li Yan-Jun, Wu Tie-Jun
Zhejiang University of Science and Technology, Hangzhou 310023, China. wuxiaoli@zust
Guang Pu Xue Yu Guang Pu Fen Xi. 2010 Apr;30(4):996-1001.
A selective model combination method is proposed in this paper to improve the precision of water quality analysis with three dimensional fluorescence spectra. A correlation coefficient criterion was designed to select effective excitation wavelengths for sub-models building, based on which the ridge regression method was adopted to combine the selected sub-models to get the stacked model. Thirty two samples from surface water and urban wastewater were used as research objects with total organic carbon (TOC) index from 3.41 to 125.35 mg x L(-1), and chemical oxygen demand (COD) index from 22.80 to 330.60 mg x L(-1), and 10 excitation wavelengths in the range of 220-400 nm were adopted to generate three dimensional fluorescence spectra. Following the proposed correlation coefficient criterion, the excitation wavelengths of 260, 280 and 400 nm, and the excitation wavelengths of 220, 280 and 400 nm were selected respectively for TOC analysis and COD analysis, based on which two stacked models were built by using partial least square regression method for sub-models building and ridge regression method for sub-models combination. The experimental results show that, compared with the sub-models with the best prediction precision, the root mean square errors of prediction (RMSEP) of the stacked models decreased by 15.4% for TOC analysis, and 17.5% for COD analysis; and compared with the models without sub-models selection, the RMSEP of the stacked models decreased by 6.1% for TOC analysis and 10.9% for COD analysis.
本文提出了一种选择性模型组合方法,以提高三维荧光光谱水质分析的精度。设计了一种相关系数准则来选择用于子模型构建的有效激发波长,在此基础上采用岭回归方法将所选子模型进行组合,得到堆叠模型。以32个地表水和城市污水样本为研究对象,总有机碳(TOC)指数在3.41至125.35mg·L⁻¹之间,化学需氧量(COD)指数在22.80至330.60mg·L⁻¹之间,并采用220 - 400nm范围内的10个激发波长生成三维荧光光谱。按照所提出的相关系数准则,分别选择260、280和400nm的激发波长用于TOC分析,以及220、280和400nm的激发波长用于COD分析,在此基础上,通过使用偏最小二乘回归方法进行子模型构建以及岭回归方法进行子模型组合,建立了两个堆叠模型。实验结果表明,与预测精度最佳的子模型相比,堆叠模型的预测均方根误差(RMSEP)在TOC分析中降低了15.4%,在COD分析中降低了17.5%;与未进行子模型选择的模型相比,堆叠模型的RMSEP在TOC分析中降低了6.1%,在COD分析中降低了10.9%。