Dufrenois Franck, Colliez Johan, Hamad Denis
Université du Littoral, Calais 62228, France.
IEEE Trans Neural Netw. 2009 Nov;20(11):1689-706. doi: 10.1109/TNN.2009.2024202. Epub 2009 Sep 22.
Support vector regression (SVR) is now a well-established method for estimating real-valued functions. However, the standard SVR is not effective to deal with severe outlier contamination of both response and predictor variables commonly encountered in numerous real applications. In this paper, we present a bounded influence SVR, which downweights the influence of outliers in all the regression variables. The proposed approach adopts an adaptive weighting strategy, which is based on both a robust adaptive scale estimator for large regression residuals and the statistic of a "kernelized" hat matrix for leverage point removal. Thus, our algorithm has the ability to accurately extract the dominant subset in corrupted data sets. Simulated linear and nonlinear data sets show the robustness of our algorithm against outliers. Last, chemical and astronomical data sets that exhibit severe outlier contamination are used to demonstrate the performance of the proposed approach in real situations.
支持向量回归(SVR)现在是一种成熟的估计实值函数的方法。然而,标准的支持向量回归在处理众多实际应用中常见的响应变量和预测变量的严重异常值污染时效果不佳。在本文中,我们提出了一种有界影响的支持向量回归,它降低了异常值在所有回归变量中的影响。所提出的方法采用了一种自适应加权策略,该策略基于用于大回归残差的稳健自适应尺度估计器和用于去除杠杆点的“核化”帽子矩阵的统计量。因此,我们的算法有能力在受污染的数据集中准确提取主导子集。模拟的线性和非线性数据集显示了我们算法对异常值的稳健性。最后,使用表现出严重异常值污染的化学和天文学数据集来证明所提出方法在实际情况下的性能。