Zhang Huai-zhu, Lin Jun, Zhang Huai-Zhu
Guang Pu Xue Yu Guang Pu Fen Xi. 2014 Jun;34(6):1707-10.
In the present paper, the outlier detection methods for determination of oil yield in oil shale using near-infrared (NIR) diffuse reflection spectroscopy was studied. During the quantitative analysis with near-infrared spectroscopy, environmental change and operator error will both produce outliers. The presence of outliers will affect the overall distribution trend of samples and lead to the decrease in predictive capability. Thus, the detection of outliers are important for the construction of high-quality calibration models. The methods including principal component analysis-Mahalanobis distance (PCA-MD) and resampling by half-means (RHM) were applied to the discrimination and elimination of outliers in this work. The thresholds and confidences for MD and RHM were optimized using the performance of partial least squares (PLS) models constructed after the elimination of outliers, respectively. Compared with the model constructed with the data of full spectrum, the values of RMSEP of the models constructed with the application of PCA-MD with a threshold of a value equal to the sum of average and standard deviation of MD, RHM with the confidence level of 85%, and the combination of PCA-MD and RHM, were reduced by 48.3%, 27.5% and 44.8%, respectively. The predictive ability of the calibration model has been improved effectively.
本文研究了利用近红外(NIR)漫反射光谱法测定油页岩产油率时的异常值检测方法。在近红外光谱定量分析过程中,环境变化和操作人员误差都会产生异常值。异常值的存在会影响样本的整体分布趋势,并导致预测能力下降。因此,异常值检测对于构建高质量校准模型至关重要。本研究采用主成分分析 - 马氏距离(PCA - MD)和半均值重采样(RHM)方法对异常值进行判别和剔除。分别利用剔除异常值后构建的偏最小二乘(PLS)模型的性能对MD和RHM的阈值及置信度进行了优化。与使用全光谱数据构建的模型相比,应用阈值等于MD平均值与标准差之和的PCA - MD、置信水平为85%的RHM以及PCA - MD与RHM相结合构建的模型的RMSEP值分别降低了48.3%、27.5%和44.8%。校准模型的预测能力得到了有效提高。