Graduate School of Business, University of Cape, Cape Town, South Africa.
Electrical and Electronic Engineering, University of Johannesburg, Johannesburg, South Africa.
PLoS One. 2024 Aug 12;19(8):e0308718. doi: 10.1371/journal.pone.0308718. eCollection 2024.
Credit scorecards are essential tools for banks to assess the creditworthiness of loan applicants. While advanced machine learning models like XGBoost and random forest often outperform traditional logistic regression in predictive accuracy, their lack of interpretability hinders their adoption in practice. This study bridges the gap between research and practice by developing a novel framework for constructing interpretable credit scorecards using Shapley values. We apply this framework to two credit datasets, discretizing numerical variables and utilizing one-hot encoding to facilitate model development. Shapley values are then employed to derive credit scores for each predictor variable group in XGBoost, random forest, LightGBM, and CatBoost models. Our results demonstrate that this approach yields credit scorecards with interpretability comparable to logistic regression while maintaining superior predictive accuracy. This framework offers a practical and effective solution for credit practitioners seeking to leverage the power of advanced models without sacrificing transparency and regulatory compliance.
信用评分卡是银行评估贷款申请人信用度的重要工具。虽然像 XGBoost 和随机森林这样的先进机器学习模型在预测准确性方面通常优于传统的逻辑回归,但它们缺乏可解释性,阻碍了它们在实践中的应用。本研究通过使用 Shapley 值构建可解释的信用评分卡的新框架,弥合了研究和实践之间的差距。我们将该框架应用于两个信用数据集,对数值变量进行离散化,并使用独热编码来促进模型开发。然后,我们使用 Shapley 值为 XGBoost、随机森林、LightGBM 和 CatBoost 模型中的每个预测变量组导出信用分数。我们的结果表明,这种方法生成的信用评分卡具有与逻辑回归相当的可解释性,同时保持了卓越的预测准确性。该框架为寻求利用先进模型的力量而又不牺牲透明度和法规遵从性的信用从业者提供了一种实用且有效的解决方案。