Takeuchi Ichiro, Bengio Yoshua, Kanamori Takafumi
Department of Information Engineering, Mie University, Tsu 514-8507, Japan.
Neural Comput. 2002 Oct;14(10):2469-96. doi: 10.1162/08997660260293300.
In the presence of a heavy-tail noise distribution, regression becomes much more difficult. Traditional robust regression methods assume that the noise distribution is symmetric, and they downweight the influence of so-called outliers. When the noise distribution is asymmetric, these methods yield biased regression estimators. Motivated by data-mining problems for the insurance industry, we propose a new approach to robust regression tailored to deal with asymmetric noise distribution. The main idea is to learn most of the parameters of the model using conditional quantile estimators (which are biased but robust estimators of the regression) and to learn a few remaining parameters to combine and correct these estimators, to minimize the average squared error in an unbiased way. Theoretical analysis and experiments show the clear advantages of the approach. Results are on artificial data as well as insurance data, using both linear and neural network predictors.
在存在重尾噪声分布的情况下,回归变得更加困难。传统的稳健回归方法假设噪声分布是对称的,并且它们会降低所谓异常值的影响。当噪声分布不对称时,这些方法会产生有偏差的回归估计量。受保险业数据挖掘问题的启发,我们提出了一种新的稳健回归方法,专门用于处理不对称噪声分布。主要思想是使用条件分位数估计量(它们是回归的有偏差但稳健的估计量)来学习模型的大部分参数,并学习其余一些参数来组合和校正这些估计量,以无偏差的方式最小化平均平方误差。理论分析和实验表明了该方法的明显优势。结果是基于人工数据以及保险数据得出的,同时使用了线性和神经网络预测器。