Wang Hong, Xu Qingsong, Zhou Lifeng
School of Mathematics & Statistics, Central South University, Changsha, Hunan, China.
PLoS One. 2015 Feb 23;10(2):e0117844. doi: 10.1371/journal.pone.0117844. eCollection 2015.
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
最近,针对信用评分问题,已经提出了各种基于不同基分类器的集成学习方法。然而,由于各种原因,使用逻辑回归作为基分类器的研究很少。在本文中,考虑到大量不平衡数据,我们探讨了使用正则化逻辑回归作为基分类器的集成学习来处理信用评分问题的合理性。在本研究中,首先通过聚类和装袋算法对数据进行平衡和多样化处理。然后,我们应用套索逻辑回归学习集成来评估信用风险。我们表明,所提出的算法在AUC和F值方面优于决策树、套索逻辑回归和随机森林等流行的信用评分模型。我们还为所提出的模型提供了两种重要性度量,以识别数据中的重要变量。