Suppr超能文献

注意:用于可解释信用评分的非参数过采样技术。

NOTE: non-parametric oversampling technique for explainable credit scoring.

作者信息

Han Seongil, Jung Haemin, Yoo Paul D, Provetti Alessandro, Cali Andrea

机构信息

School of Computing & Mathematical Sciences, University of London, Birkbeck College, London, UK.

Department of Industrial & Management Engineering, Korea National University of Transportation, Chungju, South Korea.

出版信息

Sci Rep. 2024 Oct 30;14(1):26070. doi: 10.1038/s41598-024-78055-5.

Abstract

Credit scoring models are critical for financial institutions to assess borrower risk and maintain profitability. Although machine learning models have improved credit scoring accuracy, imbalanced class distributions remain a major challenge. The widely used Synthetic Minority Oversampling TEchnique (SMOTE) struggles with high-dimensional, non-linear data and may introduce noise through class overlap. Generative Adversarial Networks (GANs) have emerged as an alternative, offering the ability to model complex data distributions. Conditional Wasserstein GANs (cWGANs) have shown promise in handling both numerical and categorical features in credit scoring datasets. However, research on extracting latent features from non-linear data and improving model explainability remains limited. To address these challenges, this paper introduces the Non-parametric Oversampling Technique for Explainable credit scoring (NOTE). The NOTE offers a unified approach that integrates a Non-parametric Stacked Autoencoder (NSA) for capturing non-linear latent features, cWGAN for oversampling the minority class, and a classification process designed to enhance explainability. The experimental results demonstrate that NOTE surpasses state-of-the-art oversampling techniques by improving classification accuracy and model stability, particularly in non-linear and imbalanced credit scoring datasets, while also enhancing the explainability of the results.

摘要

信用评分模型对于金融机构评估借款人风险和维持盈利能力至关重要。尽管机器学习模型提高了信用评分的准确性,但类分布不均衡仍然是一个主要挑战。广泛使用的合成少数过采样技术(SMOTE)在处理高维、非线性数据时存在困难,并且可能通过类重叠引入噪声。生成对抗网络(GAN)作为一种替代方法出现,具有对复杂数据分布进行建模的能力。条件瓦瑟斯坦生成对抗网络(cWGAN)在处理信用评分数据集中的数值和分类特征方面已显示出前景。然而,关于从非线性数据中提取潜在特征并提高模型可解释性的研究仍然有限。为了应对这些挑战,本文介绍了用于可解释信用评分的非参数过采样技术(NOTE)。NOTE提供了一种统一的方法,该方法集成了用于捕获非线性潜在特征的非参数堆叠自动编码器(NSA)、用于对少数类进行过采样的cWGAN以及旨在增强可解释性的分类过程。实验结果表明,NOTE通过提高分类准确性和模型稳定性超越了现有的过采样技术,特别是在非线性和不均衡的信用评分数据集中,同时还增强了结果的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb8/11525592/fb9fa26f47ff/41598_2024_78055_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验