Park Seo Young, Liu Yufeng
Department of Health Studies, Chicago, IL 60615, USA.
Can J Stat. 2011 Jun 1;39(2):300-323. doi: 10.1002/cjs.10105.
The penalized logistic regression (PLR) is a powerful statistical tool for classification. It has been commonly used in many practical problems. Despite its success, since the loss function of the PLR is unbounded, resulting classifiers can be sensitive to outliers. To build more robust classifiers, we propose the robust PLR (RPLR) which uses truncated logistic loss functions, and suggest three schemes to estimate conditional class probabilities. Connections of the RPLR with some other existing work on robust logistic regression have been discussed. Our theoretical results indicate that the RPLR is Fisher consistent and more robust to outliers. Moreover, we develop estimated generalized approximate cross validation (EGACV) for the tuning parameter selection. Through numerical examples, we demonstrate that truncating the loss function indeed yields better performance in terms of classification accuracy and class probability estimation.
惩罚逻辑回归(PLR)是一种强大的分类统计工具。它已被广泛应用于许多实际问题中。尽管取得了成功,但由于PLR的损失函数是无界的,因此所得分类器可能对异常值敏感。为了构建更稳健的分类器,我们提出了使用截断逻辑损失函数的稳健PLR(RPLR),并提出了三种估计条件类概率的方案。讨论了RPLR与其他一些关于稳健逻辑回归的现有工作之间的联系。我们的理论结果表明,RPLR是Fisher一致的,并且对异常值更具稳健性。此外,我们开发了用于调整参数选择的估计广义近似交叉验证(EGACV)。通过数值例子,我们证明了截断损失函数在分类准确率和类概率估计方面确实产生了更好的性能。