Shaffer R E, Small G W, Arnold M A
Department of Chemistry, Ohio University, Athens 45701, USA.
Anal Chem. 1996 Aug 1;68(15):2663-75. doi: 10.1021/ac960049g.
A multivariate calibration procedure is described that is based on the use of a genetic algorithm (GA) to guide the coupling of bandpass digital filtering and partial least-squares (PLS) regression. The measurement of glucose in three different biological matrices with near-infrared spectroscopy is employed to develop this protocol. The GA is employed to optimize the position and width of the bandpass digital filter, the spectral range for PLS regression, and the number of PLS factors used in building the calibration model. The optimization of these variables is difficult because the values of the variables employ different units, resulting in a tendency for local optima to occur on the response surface of the optimization. Two issues are found to be critical to the success of the optimization: the configuration of the GA and the development of an appropriate fitness function. An integer representation for the GA is employed to overcome the difficulty in optimizing variables that are dissimilar, and the optimal GA configuration is found through experimental design methods. Three fitness function calculations are compared for their ability to lead the GA to better calibration models. A fitness function based on the combination of the mean-squared error in the calibration set data, the mean-squared error in the monitoring set data, and the number of PLS factors raised to a weighting factor is found to perform best. Multiple random drawings of the calibration and monitoring sets are also found to improve the optimization performance. Using this fitness function and three random drawings of the calibration and monitoring sets, the GA found calibration models that required fewer PLS factors yet had similar or better prediction abilities compared to calibration models found through an optimization protocol based on a grid search method.
描述了一种多变量校准程序,该程序基于使用遗传算法(GA)来指导带通数字滤波与偏最小二乘(PLS)回归的耦合。采用近红外光谱法测量三种不同生物基质中的葡萄糖来开发此方案。GA用于优化带通数字滤波器的位置和宽度、PLS回归的光谱范围以及用于建立校准模型的PLS因子数量。这些变量的优化很困难,因为变量的值采用不同的单位,导致在优化的响应面上容易出现局部最优。发现有两个问题对优化的成功至关重要:GA的配置和合适的适应度函数的开发。采用GA的整数表示法来克服优化不同变量的困难,并通过实验设计方法找到最佳的GA配置。比较了三种适应度函数计算引导GA获得更好校准模型的能力。发现基于校准集数据中的均方误差、监测集数据中的均方误差以及提升至加权因子的PLS因子数量的组合的适应度函数表现最佳。还发现校准集和监测集的多次随机抽取可提高优化性能。使用此适应度函数以及校准集和监测集的三次随机抽取,GA找到的校准模型与通过基于网格搜索方法的优化方案找到的校准模型相比,所需的PLS因子更少,但具有相似或更好的预测能力。