Cao Hui, Zhou Yan
School of Electrical Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
Guang Pu Xue Yu Guang Pu Fen Xi. 2011 Jul;31(7):1847-51.
The present paper proposed an outlier detection method for spectral analysis based on multi-population elitists shared genetic algorithm. The method was exploited in the NIR data set analysis to remove the outliers from the data set, and partial least squares (PLS) was combined with the proposed method to build a prediction model. In contrast with Monte Carlo cross validation, leave-one-out cross validation, Mahalanobis-distance and traditional genetic algorithm for outlier detection, the prediction residual error sum of squares (PRESS) for moisture prediction model based on the proposed method decreases in the rate of 72.4%, 39.5%, 39.5% and 14.5%; the PRESS value for fat prediction model decreases in the rate of 86.2%, 75.9%, 84.9% and 19.9%; and the PRESS value for protein prediction model decreases in the rate of 56.5%, 35.7%, 35.7% and 18.2% respectively. Results indicated that the method is applicable for spectral outlier detection for different species, and the model based on the data set without the removed outliers is more accurate and robust.
本文提出了一种基于多种群精英共享遗传算法的光谱分析离群点检测方法。该方法用于近红外数据集分析,以去除数据集中的离群点,并将偏最小二乘法(PLS)与该方法相结合来建立预测模型。与蒙特卡罗交叉验证、留一法交叉验证、马氏距离和传统遗传算法用于离群点检测相比,基于该方法的水分预测模型的预测残差平方和(PRESS)分别以72.4%、39.5%、39.5%和14.5%的速率下降;脂肪预测模型的PRESS值分别以86.2%、75.9%、84.9%和19.9%的速率下降;蛋白质预测模型的PRESS值分别以56.5%、35.7%、35.7%和18.2%的速率下降。结果表明,该方法适用于不同物种的光谱离群点检测,基于去除离群点后的数据集建立的模型更准确、更稳健。