Suppr超能文献

利用替代数据进行综合评估,提高信用评分准确性。

Enhancing credit scoring accuracy with a comprehensive evaluation of alternative data.

机构信息

Graduate School of Business, University of Cape, Cape Town, South Africa.

Electrical and Electronic Engineering, University of Johannesburg, Johannesburg, South Africa.

出版信息

PLoS One. 2024 May 21;19(5):e0303566. doi: 10.1371/journal.pone.0303566. eCollection 2024.

Abstract

This study explores the potential of utilizing alternative data sources to enhance the accuracy of credit scoring models, compared to relying solely on traditional data sources, such as credit bureau data. A comprehensive dataset from the Home Credit Group's home loan portfolio is analysed. The research examines the impact of incorporating alternative predictors that are typically overlooked, such as an applicant's social network default status, regional economic ratings, and local population characteristics. The modelling approach applies the model-X knockoffs framework for systematic variable selection. By including these alternative data sources, the credit scoring models demonstrate improved predictive performance, achieving an area under the curve metric of 0.79360 on the Kaggle Home Credit default risk competition dataset, outperforming models that relied solely on traditional data sources, such as credit bureau data. The findings highlight the significance of leveraging diverse, non-traditional data sources to augment credit risk assessment capabilities and overall model accuracy.

摘要

本研究探讨了利用替代数据源来提高信用评分模型准确性的潜力,与仅依赖传统数据源(如信用局数据)相比。对 Home Credit 集团住房贷款组合的综合数据集进行了分析。研究考察了纳入通常被忽视的替代预测因子(如申请人的社交网络违约状况、地区经济评级和当地人口特征)的影响。该建模方法应用模型-X 复制器框架进行系统变量选择。通过纳入这些替代数据源,信用评分模型显示出改进的预测性能,在 Kaggle Home Credit 违约风险竞赛数据集上的曲线下面积指标达到 0.79360,优于仅依赖传统数据源(如信用局数据)的模型。研究结果强调了利用多样化、非传统数据源来增强信用风险评估能力和整体模型准确性的重要性。

相似文献

1
Enhancing credit scoring accuracy with a comprehensive evaluation of alternative data.
PLoS One. 2024 May 21;19(5):e0303566. doi: 10.1371/journal.pone.0303566. eCollection 2024.
2
A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards.
PLoS One. 2024 Aug 12;19(8):e0308718. doi: 10.1371/journal.pone.0308718. eCollection 2024.
3
NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.
PLoS One. 2024 Dec 31;19(12):e0316454. doi: 10.1371/journal.pone.0316454. eCollection 2024.
5
Analysis of Bank Credit Risk Evaluation Model Based on BP Neural Network.
Comput Intell Neurosci. 2022 Mar 10;2022:2724842. doi: 10.1155/2022/2724842. eCollection 2022.
6
SMART: Structured Missingness Analysis and Reconstruction Technique for credit scoring.
Sci Rep. 2025 Apr 29;15(1):15111. doi: 10.1038/s41598-025-99997-4.
7
A credit scoring model based on the Myers-Briggs type indicator in online peer-to-peer lending.
Financ Innov. 2022;8(1):42. doi: 10.1186/s40854-022-00347-4. Epub 2022 May 3.
8
NOTE: non-parametric oversampling technique for explainable credit scoring.
Sci Rep. 2024 Oct 30;14(1):26070. doi: 10.1038/s41598-024-78055-5.
10
BACS: blockchain and AutoML-based technology for efficient credit scoring classification.
Ann Oper Res. 2022 Jan 24:1-21. doi: 10.1007/s10479-022-04531-8.

引用本文的文献

1
Enhancing infectious disease prediction model selection with multi-objective optimization: an empirical study.
PeerJ Comput Sci. 2024 Jul 29;10:e2217. doi: 10.7717/peerj-cs.2217. eCollection 2024.
2
A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards.
PLoS One. 2024 Aug 12;19(8):e0308718. doi: 10.1371/journal.pone.0308718. eCollection 2024.

本文引用的文献

1
Cross-validation: what does it estimate and how well does it do it?
J Am Stat Assoc. 2024;119(546):1434-1445. doi: 10.1080/01621459.2023.2197686. Epub 2023 May 15.
2
Controlled variable selection in Weibull mixture cure models for high-dimensional data.
Stat Med. 2022 Sep 30;41(22):4340-4366. doi: 10.1002/sim.9513. Epub 2022 Jul 6.
3
A logistic regression model for consumer default risk.
J Appl Stat. 2020 May 5;47(13-15):2879-2894. doi: 10.1080/02664763.2020.1759030. eCollection 2020.
4
Identification of putative causal loci in whole-genome sequencing data via knockoff statistics.
Nat Commun. 2021 May 25;12(1):3152. doi: 10.1038/s41467-021-22889-4.
5
Deep-gKnock: Nonlinear group-feature selection with deep neural networks.
Neural Netw. 2021 Mar;135:139-147. doi: 10.1016/j.neunet.2020.12.004. Epub 2020 Dec 14.
6
Automated machine learning: Review of the state-of-the-art and opportunities for healthcare.
Artif Intell Med. 2020 Apr;104:101822. doi: 10.1016/j.artmed.2020.101822. Epub 2020 Feb 21.
7
International evaluation of an AI system for breast cancer screening.
Nature. 2020 Jan;577(7788):89-94. doi: 10.1038/s41586-019-1799-6. Epub 2020 Jan 1.
8
A review of feature selection methods in medical applications.
Comput Biol Med. 2019 Sep;112:103375. doi: 10.1016/j.compbiomed.2019.103375. Epub 2019 Jul 31.
9
A feature learning approach based on XGBoost for driving assessment and risk prediction.
Accid Anal Prev. 2019 Aug;129:170-179. doi: 10.1016/j.aap.2019.05.005. Epub 2019 May 30.
10
False Discovery Rate Control in Cancer Biomarker Selection Using Knockoffs.
Cancers (Basel). 2019 May 29;11(6):744. doi: 10.3390/cancers11060744.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验