Graduate School of Business, University of Cape, Cape Town, South Africa.
Electrical and Electronic Engineering, University of Johannesburg, Johannesburg, South Africa.
PLoS One. 2024 May 21;19(5):e0303566. doi: 10.1371/journal.pone.0303566. eCollection 2024.
This study explores the potential of utilizing alternative data sources to enhance the accuracy of credit scoring models, compared to relying solely on traditional data sources, such as credit bureau data. A comprehensive dataset from the Home Credit Group's home loan portfolio is analysed. The research examines the impact of incorporating alternative predictors that are typically overlooked, such as an applicant's social network default status, regional economic ratings, and local population characteristics. The modelling approach applies the model-X knockoffs framework for systematic variable selection. By including these alternative data sources, the credit scoring models demonstrate improved predictive performance, achieving an area under the curve metric of 0.79360 on the Kaggle Home Credit default risk competition dataset, outperforming models that relied solely on traditional data sources, such as credit bureau data. The findings highlight the significance of leveraging diverse, non-traditional data sources to augment credit risk assessment capabilities and overall model accuracy.
本研究探讨了利用替代数据源来提高信用评分模型准确性的潜力,与仅依赖传统数据源(如信用局数据)相比。对 Home Credit 集团住房贷款组合的综合数据集进行了分析。研究考察了纳入通常被忽视的替代预测因子(如申请人的社交网络违约状况、地区经济评级和当地人口特征)的影响。该建模方法应用模型-X 复制器框架进行系统变量选择。通过纳入这些替代数据源,信用评分模型显示出改进的预测性能,在 Kaggle Home Credit 违约风险竞赛数据集上的曲线下面积指标达到 0.79360,优于仅依赖传统数据源(如信用局数据)的模型。研究结果强调了利用多样化、非传统数据源来增强信用风险评估能力和整体模型准确性的重要性。