Stanimirova I, Walczak B
Department of Chemometrics, Institute of Chemistry, Silesian University, 9 Szkolna Street, 40-006 Katowice, Poland.
Talanta. 2008 Jul 30;76(3):602-9. doi: 10.1016/j.talanta.2008.03.049. Epub 2008 Apr 8.
Missing elements and outliers can often occur in experimental data. The presence of outliers makes the evaluation of any least squares model parameters difficult, while the missing values influence the adequate identification of outliers. Therefore, approaches that can handle incomplete data containing outliers are highly valued. In this paper, we present the expectation-maximization robust soft independent modeling of class analogy approach (EM-S-SIMCA) based on the recently introduced spherical SIMCA method. Several important issues like the possibility of choosing the complexity of the model with the leverage correction procedure, the selection of training and test sets using methods of uniform design for incomplete data and prediction of new samples containing missing elements are discussed. The results of a comparison study showed that EM-S-SIMCA outperforms the classic expectation-maximization SIMCA method. The performance of the method was illustrated on simulated and real data sets and led to satisfactory results.
实验数据中经常会出现缺失元素和异常值。异常值的存在使得对任何最小二乘模型参数的评估变得困难,而缺失值则会影响对异常值的准确识别。因此,能够处理包含异常值的不完整数据的方法备受重视。在本文中,我们基于最近提出的球形SIMCA方法,提出了期望最大化稳健软独立类比建模方法(EM-S-SIMCA)。讨论了几个重要问题,如通过杠杆校正程序选择模型复杂度的可能性、使用不完全数据均匀设计方法选择训练集和测试集以及对包含缺失元素的新样本进行预测。比较研究结果表明,EM-S-SIMCA优于经典的期望最大化SIMCA方法。该方法在模拟数据集和真实数据集上的性能得到了验证,并取得了令人满意的结果。