使用套索逻辑回归集成的大规模不平衡信用评分

Large unbalanced credit scoring using Lasso-logistic regression ensemble.

作者信息

Wang Hong, Xu Qingsong, Zhou Lifeng

机构信息

School of Mathematics & Statistics, Central South University, Changsha, Hunan, China.

出版信息

PLoS One. 2015 Feb 23;10(2):e0117844. doi: 10.1371/journal.pone.0117844. eCollection 2015.

DOI:10.1371/journal.pone.0117844

PMID:25706988

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4338292/

Abstract

Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

摘要

最近，针对信用评分问题，已经提出了各种基于不同基分类器的集成学习方法。然而，由于各种原因，使用逻辑回归作为基分类器的研究很少。在本文中，考虑到大量不平衡数据，我们探讨了使用正则化逻辑回归作为基分类器的集成学习来处理信用评分问题的合理性。在本研究中，首先通过聚类和装袋算法对数据进行平衡和多样化处理。然后，我们应用套索逻辑回归学习集成来评估信用风险。我们表明，所提出的算法在AUC和F值方面优于决策树、套索逻辑回归和随机森林等流行的信用评分模型。我们还为所提出的模型提供了两种重要性度量，以识别数据中的重要变量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b0d/4338292/fbd00856e5a4/pone.0117844.g001.jpg

相似文献

Large unbalanced credit scoring using Lasso-logistic regression ensemble.

PLoS One. 2015 Feb 23;10(2):e0117844. doi: 10.1371/journal.pone.0117844. eCollection 2015.

Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

J Biomed Inform. 2015 Feb;53:277-90. doi: 10.1016/j.jbi.2014.11.013. Epub 2014 Dec 9.

Automatic Estimation of Osteoporotic Fracture Cases by Using Ensemble Learning Approaches.

J Med Syst. 2016 Mar;40(3):61. doi: 10.1007/s10916-015-0413-1. Epub 2015 Dec 12.

A novel ensemble machine learning for robust microarray data classification.

Comput Biol Med. 2006 Jun;36(6):553-73. doi: 10.1016/j.compbiomed.2005.04.001. Epub 2005 Jun 23.

Teaching a Machine to Feel Postoperative Pain: Combining High-Dimensional Clinical Data with Machine Learning Algorithms to Forecast Acute Postoperative Pain.

Pain Med. 2015 Jul;16(7):1386-401. doi: 10.1111/pme.12713. Epub 2015 May 29.

Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.

Comput Biol Med. 2011 May;41(5):265-71. doi: 10.1016/j.compbiomed.2011.03.001. Epub 2011 Mar 17.

An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):657-66. doi: 10.1109/TCBB.2014.2306838.

AUC-Maximizing Ensembles through Metalearning.

Int J Biostat. 2016 May 1;12(1):203-18. doi: 10.1515/ijb-2015-0035.

Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

PLoS One. 2017 Jul 24;12(7):e0179805. doi: 10.1371/journal.pone.0179805. eCollection 2017.

Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms.

Comput Methods Programs Biomed. 2011 Dec;104(3):443-51. doi: 10.1016/j.cmpb.2011.03.018. Epub 2011 Apr 30.

引用本文的文献

A hybrid unsupervised machine learning model with spectral clustering and semi-supervised support vector machine for credit risk assessment.

PLoS One. 2025 Jan 10;20(1):e0316557. doi: 10.1371/journal.pone.0316557. eCollection 2025.

Exploring the impact of sequence context on errors in SNP genotype calling with whole genome sequencing data using AI-based autoencoder approach.

NAR Genom Bioinform. 2024 Sep 24;6(3):lqae131. doi: 10.1093/nargab/lqae131. eCollection 2024 Sep.

Identification and experimental validation of key genes in osteoarthritis based on machine learning algorithms and single-cell sequencing analysis.

Heliyon. 2024 Aug 28;10(17):e37047. doi: 10.1016/j.heliyon.2024.e37047. eCollection 2024 Sep 15.

Utilizing logistic regression to compare risk factors in disease modeling with imbalanced data: a case study in vitamin D and cancer incidence.

Front Oncol. 2023 Sep 28;13:1227842. doi: 10.3389/fonc.2023.1227842. eCollection 2023.

Using radiomics based on multicenter magnetic resonance images to predict isocitrate dehydrogenase mutation status of gliomas.

Quant Imaging Med Surg. 2023 Apr 1;13(4):2143-2155. doi: 10.21037/qims-22-836. Epub 2023 Mar 2.

Application of artificial intelligence models for detecting the pterygium that requires surgical treatment based on anterior segment images.

Front Neurosci. 2022 Dec 20;16:1084118. doi: 10.3389/fnins.2022.1084118. eCollection 2022.

A descriptive study of variable discretization and cost-sensitive logistic regression on imbalanced credit data.

J Appl Stat. 2019 Jul 23;47(3):568-581. doi: 10.1080/02664763.2019.1643829. eCollection 2020.

A novel adaptive ensemble classification framework for ADME prediction.

RSC Adv. 2018 Mar 26;8(21):11661-11683. doi: 10.1039/c8ra01206g. eCollection 2018 Mar 21.

Credit card fraud detection using a hierarchical behavior-knowledge space model.

PLoS One. 2022 Jan 20;17(1):e0260579. doi: 10.1371/journal.pone.0260579. eCollection 2022.

Determining occupation for National Violent Death Reporting System records: An evaluation of autocoding programs.

Am J Ind Med. 2021 Dec;64(12):1018-1027. doi: 10.1002/ajim.23292. Epub 2021 Sep 7.

本文引用的文献

An AUC-based permutation variable importance measure for random forests.

BMC Bioinformatics. 2013 Apr 5;14:119. doi: 10.1186/1471-2105-14-119.

Regularization Paths for Generalized Linear Models via Coordinate Descent.

J Stat Softw. 2010;33(1):1-22.

A theoretical and experimental analysis of linear combiners for multiple classifier systems.

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):942-56. doi: 10.1109/TPAMI.2005.109.

Ensemble learning via negative correlation.

Neural Netw. 1999 Dec;12(10):1399-1404. doi: 10.1016/s0893-6080(99)00073-8.

The log transformation is special.

Stat Med. 1995 Apr 30;14(8):811-9. doi: 10.1002/sim.4780140810.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用套索逻辑回归集成的大规模不平衡信用评分

Large unbalanced credit scoring using Lasso-logistic regression ensemble.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献