Shinzawa Hideyuki, Li Boyan, Nakagawa Takehiro, Maruo Katsuhiko, Ozaki Yukihiro
Department of Chemistry and Research Center for Near Infrared Spectroscopy, School of Science and Technology, Kwansei-Gakuin University, Hyogo 669-1337, Japan.
Appl Spectrosc. 2006 Jun;60(6):631-40. doi: 10.1366/000370206777670576.
In this study, multi-objective genetic algorithms (GAs) are introduced to partial least squares (PLS) model building. This method aims to improve the performance and robustness of the PLS model by removing samples with systematic errors, including outliers, from the original data. Multi-objective GA optimizes the combination of these samples to be removed. Training and validation sets were used to reduce the undesirable effects of over-fitting on the training set by multi-objective GA. The reduction of the over-fitting leads to accurate and robust PLS models. To clearly visualize the factors of the systematic errors, an index defined with the original PLS model and a specific Pareto-optimal solution is also introduced. This method is applied to three kinds of near-infrared (NIR) spectra to build PLS models. The results demonstrate that multi-objective GA significantly improves the performance of the PLS models. They also show that the sample selection by multi-objective GA enhances the ability of the PLS models to detect samples with systematic errors.
在本研究中,将多目标遗传算法(GAs)引入到偏最小二乘法(PLS)模型构建中。该方法旨在通过从原始数据中去除包含异常值等具有系统误差的样本,来提高PLS模型的性能和稳健性。多目标遗传算法优化要去除的这些样本的组合。使用训练集和验证集来减少多目标遗传算法对训练集过度拟合的不良影响。过度拟合的减少导致了准确且稳健的PLS模型。为了清晰地可视化系统误差的因素,还引入了一个由原始PLS模型和特定帕累托最优解定义的指标。该方法应用于三种近红外(NIR)光谱以构建PLS模型。结果表明,多目标遗传算法显著提高了PLS模型的性能。结果还表明,多目标遗传算法进行的样本选择增强了PLS模型检测具有系统误差样本的能力。