Suppr超能文献

NATE:用于不平衡类可解释信用评分的非参数方法。

NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.

作者信息

Han Seongil, Jung Haemin

机构信息

School of Computing & Mathematical Sciences, University of London, Birkbeck College, London, United Kingdom.

Department of Industrial & Management Engineering, Korea National University of Transportation, Chungju, South Korea.

出版信息

PLoS One. 2024 Dec 31;19(12):e0316454. doi: 10.1371/journal.pone.0316454. eCollection 2024.

Abstract

Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model's decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE's capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.

摘要

信用评分模型在金融机构评估借款人风险和维持盈利能力方面发挥着至关重要的作用。逻辑回归因其稳健性、可解释性和计算效率而在信用评分中被广泛使用;然而,当应用于复杂或非线性数据集时,其预测能力会下降,导致准确性降低。相比之下,基于树的机器学习模型通常能提供更高的预测性能,但在可解释性方面存在困难。此外,信用评分中普遍存在的类分布不平衡会对模型的准确性和稳健性产生不利影响,因为多数类往往占据主导地位。尽管存在这些挑战,但在信用评分领域全面解决预测性能和可解释性这两个方面的研究仍然有限。本文介绍了用于可解释信用评分的非参数过采样方法(NATE),这是一个旨在通过将过采样技术与基于树的分类器相结合来应对这些挑战的框架,以提高模型性能和可解释性。NATE纳入了类平衡方法来减轻数据分布不平衡的影响,并集成了可解释性特征以阐明模型的决策过程。实验结果表明,在信用风险分类中,NATE显著优于传统逻辑回归,AUC提高了19.33%,MCC提高了71.56%,F1分数提高了85.33%。过采样方法,特别是与梯度提升一起使用时,与欠采样相比显示出更高的有效性,实现了AUC的最佳指标:0.9649,MCC:0.8104,F1分数:0.9072。此外,NATE通过提供对特征贡献的详细见解来增强可解释性,有助于理解单个预测。这些发现突出了NATE在管理类不平衡、提高预测性能和增强模型可解释性方面的能力,证明了其作为信用评分应用中可靠且透明工具的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dea/11687932/b2eac194d6ce/pone.0316454.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验