SMART：信用评分的结构化缺失分析与重建技术

SMART: Structured Missingness Analysis and Reconstruction Technique for credit scoring.

作者信息

Han Seongil, Jung Haemin, Yoo Paul D

机构信息

Department of Computer Science, University of Suwon, Hwaseong, South Korea.

Department of Industrial and Management Engineering, Korea National University of Transportation, Chungju, South Korea.

出版信息

Sci Rep. 2025 Apr 29;15(1):15111. doi: 10.1038/s41598-025-99997-4.

DOI:10.1038/s41598-025-99997-4

PMID:40301510

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12041391/

Abstract

The Basel Accord emphasizes the necessity of employing internal data models to manage key credit risk components, including Probability of Default (PD), Loss Given Default (LGD), and Exposure At Default (EAD). Among these, internal datasets are critical for estimating PD, a fundamental measure of borrower creditworthiness. Nevertheless, practical application often faces challenges due to incomplete datasets, which can skew analyses and undermine the accuracy of credit scoring models. Traditional approaches to addressing missing data, such as sample deletion or mean imputation, are widely used; however, they often prove insufficient for accurate prediction. Consequently, imputation methods are typically favored over deletion, as they allow for the full utilization of available data. Recent advancements have introduced more sophisticated techniques, such as Generative Adversarial Imputation Networks (GAIN), which utilize a generative adversarial network to model data distributions and impute missing values with greater precision than conventional methods. Building on these developments, this study proposes a novel imputation approach, SMART (Structured Missingness Analysis and Reconstruction Technique) for credit scoring datasets. SMART consists of two primary stages: first, it normalizes and denoises the dataset using randomized Singular Value Decomposition (rSVD), followed by the implementation of GAIN to impute missing values. Experimental results demonstrate that SMART significantly outperforms existing state-of-the-art methods, particularly in high missing data contexts (20%, 50%, and 80%), with improvements in imputation accuracy of 7.04%, 6.34%, and 13.38%, respectively. In conclusion, SMART represents a substantial advancement in handling incomplete credit scoring datasets, leading to more precise PD estimation and enhancing the robustness of credit risk management models.

摘要

《巴塞尔协议》强调采用内部数据模型来管理关键信用风险要素的必要性，这些要素包括违约概率（PD）、违约损失率（LGD）和违约风险暴露（EAD）。其中，内部数据集对于估计PD至关重要，PD是衡量借款人信用worthiness的一项基本指标。然而，由于数据集不完整，实际应用中常常面临挑战，这可能会扭曲分析结果并削弱信用评分模型的准确性。解决缺失数据的传统方法，如样本删除或均值插补，被广泛使用；然而，它们往往被证明不足以进行准确预测。因此，插补方法通常比删除方法更受青睐，因为它们能够充分利用可用数据。最近的进展引入了更复杂的技术，如生成对抗插补网络（GAIN），该技术利用生成对抗网络对数据分布进行建模，并比传统方法更精确地插补缺失值。基于这些发展，本研究提出了一种用于信用评分数据集的新型插补方法，即SMART（结构化缺失分析与重建技术）。SMART由两个主要阶段组成：首先，它使用随机奇异值分解（rSVD）对数据集进行归一化和去噪，然后实施GAIN来插补缺失值。实验结果表明，SMART显著优于现有的最先进方法，特别是在高缺失数据情况下（20%、50%和80%），插补准确率分别提高了7.04%、6.34%和13.38%。总之，SMART在处理不完整信用评分数据集方面取得了重大进展，从而实现更精确的PD估计并增强信用风险管理模型的稳健性。