一种增强信用评分透明度的新框架：利用 Shapley 值构建可解释的信用评分卡。

A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards.

机构信息

Graduate School of Business, University of Cape, Cape Town, South Africa.

Electrical and Electronic Engineering, University of Johannesburg, Johannesburg, South Africa.

出版信息

PLoS One. 2024 Aug 12;19(8):e0308718. doi: 10.1371/journal.pone.0308718. eCollection 2024.

DOI:10.1371/journal.pone.0308718

PMID:39133710

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11318906/

Abstract

Credit scorecards are essential tools for banks to assess the creditworthiness of loan applicants. While advanced machine learning models like XGBoost and random forest often outperform traditional logistic regression in predictive accuracy, their lack of interpretability hinders their adoption in practice. This study bridges the gap between research and practice by developing a novel framework for constructing interpretable credit scorecards using Shapley values. We apply this framework to two credit datasets, discretizing numerical variables and utilizing one-hot encoding to facilitate model development. Shapley values are then employed to derive credit scores for each predictor variable group in XGBoost, random forest, LightGBM, and CatBoost models. Our results demonstrate that this approach yields credit scorecards with interpretability comparable to logistic regression while maintaining superior predictive accuracy. This framework offers a practical and effective solution for credit practitioners seeking to leverage the power of advanced models without sacrificing transparency and regulatory compliance.

摘要

信用评分卡是银行评估贷款申请人信用度的重要工具。虽然像 XGBoost 和随机森林这样的先进机器学习模型在预测准确性方面通常优于传统的逻辑回归，但它们缺乏可解释性，阻碍了它们在实践中的应用。本研究通过使用 Shapley 值构建可解释的信用评分卡的新框架，弥合了研究和实践之间的差距。我们将该框架应用于两个信用数据集，对数值变量进行离散化，并使用独热编码来促进模型开发。然后，我们使用 Shapley 值为 XGBoost、随机森林、LightGBM 和 CatBoost 模型中的每个预测变量组导出信用分数。我们的结果表明，这种方法生成的信用评分卡具有与逻辑回归相当的可解释性，同时保持了卓越的预测准确性。该框架为寻求利用先进模型的力量而又不牺牲透明度和法规遵从性的信用从业者提供了一种实用且有效的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43e0/11318906/63bcecd26a53/pone.0308718.g001.jpg

相似文献

A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards.一种增强信用评分透明度的新框架：利用 Shapley 值构建可解释的信用评分卡。

PLoS One. 2024 Aug 12;19(8):e0308718. doi: 10.1371/journal.pone.0308718. eCollection 2024.

A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架：方法与验证研究

JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果：一种针对特定个体见解的新型验证方法。

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Hybrid feature selection framework for enhanced credit card fraud detection using machine learning models.使用机器学习模型增强信用卡欺诈检测的混合特征选择框架

PLoS One. 2025 Jul 16;20(7):e0326975. doi: 10.1371/journal.pone.0326975. eCollection 2025.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型：基于多中心队列研究的开发与验证研究

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

A novel double machine learning approach for detecting early breast cancer using advanced feature selection and dimensionality reduction techniques.一种使用先进特征选择和降维技术检测早期乳腺癌的新型双机器学习方法。

Sci Rep. 2025 Jul 2;15(1):22971. doi: 10.1038/s41598-025-06426-7.

Predictive modeling of complications arising from early-onset preeclampsia in pregnant women.早发型子痫前期孕妇并发症的预测模型

Womens Health (Lond). 2025 Jan-Dec;21:17455057251348978. doi: 10.1177/17455057251348978. Epub 2025 Jul 21.

Optimized feature selection and advanced machine learning for stroke risk prediction in revascularized coronary artery disease patients.优化特征选择与先进机器学习用于预测冠状动脉疾病血运重建患者的卒中风险

BMC Med Inform Decis Mak. 2025 Jul 24;25(1):276. doi: 10.1186/s12911-025-03116-2.

Clinical prediction of intravenous immunoglobulin-resistant Kawasaki disease based on interpretable Transformer model.基于可解释Transformer模型的静脉注射免疫球蛋白抵抗性川崎病的临床预测

PLoS One. 2025 Jul 9;20(7):e0327564. doi: 10.1371/journal.pone.0327564. eCollection 2025.

Differential Predictability of Preterm Birth Types: Strong Signals for Indicated Cases versus Limited Success in Spontaneous Preterm Birth.早产类型的差异可预测性：指征性病例的强信号与自发性早产的有限成功率

medRxiv. 2025 Jul 10:2025.07.09.25329712. doi: 10.1101/2025.07.09.25329712.

引用本文的文献

Correction: A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards.更正：一种提高信用评分透明度的新框架：利用夏普利值实现可解释的信用评分卡。

PLoS One. 2025 Aug 5;20(8):e0329901. doi: 10.1371/journal.pone.0329901. eCollection 2025.

本文引用的文献

Enhancing credit scoring accuracy with a comprehensive evaluation of alternative data.利用替代数据进行综合评估，提高信用评分准确性。

PLoS One. 2024 May 21;19(5):e0303566. doi: 10.1371/journal.pone.0303566. eCollection 2024.

A logistic regression model for consumer default risk.用于消费者违约风险的逻辑回归模型。

J Appl Stat. 2020 May 5;47(13-15):2879-2894. doi: 10.1080/02664763.2020.1759030. eCollection 2020.

BACS: blockchain and AutoML-based technology for efficient credit scoring classification.BACS：基于区块链和自动机器学习的高效信用评分分类技术。

Ann Oper Res. 2022 Jan 24:1-21. doi: 10.1007/s10479-022-04531-8.

Explainable AI in Fintech Risk Management.金融科技风险管理中的可解释人工智能

Front Artif Intell. 2020 Apr 24;3:26. doi: 10.3389/frai.2020.00026. eCollection 2020.

International evaluation of an AI system for breast cancer screening.国际乳腺癌筛查人工智能系统评估。

Nature. 2020 Jan;577(7788):89-94. doi: 10.1038/s41586-019-1799-6. Epub 2020 Jan 1.

A review of feature selection methods in medical applications.医学应用中的特征选择方法综述。

Comput Biol Med. 2019 Sep;112:103375. doi: 10.1016/j.compbiomed.2019.103375. Epub 2019 Jul 31.

On the interpretability of machine learning-based model for predicting hypertension.基于机器学习的高血压预测模型的可解释性研究。

BMC Med Inform Decis Mak. 2019 Jul 29;19(1):146. doi: 10.1186/s12911-019-0874-0.

A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.应用于微阵列数据的特征选择与特征提取方法综述

Adv Bioinformatics. 2015;2015:198363. doi: 10.1155/2015/198363. Epub 2015 Jun 11.

Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach.使用受试者工作特征曲线下面积评估成像检查的缺点：一种替代方法的讨论与建议

Eur Radiol. 2015 Apr;25(4):932-9. doi: 10.1007/s00330-014-3487-0. Epub 2015 Jan 20.

Discrimination between alternative binary response models.不同二元响应模型之间的判别

Biometrika. 1967 Dec;54(3):573-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种增强信用评分透明度的新框架：利用 Shapley 值构建可解释的信用评分卡。

A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献