Suppr超能文献

使用机器学习模型增强信用卡欺诈检测的混合特征选择框架

Hybrid feature selection framework for enhanced credit card fraud detection using machine learning models.

作者信息

Siam Al Mahmud, Bhowmik Pankaj, Uddin Md Palash

机构信息

Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh.

出版信息

PLoS One. 2025 Jul 16;20(7):e0326975. doi: 10.1371/journal.pone.0326975. eCollection 2025.

Abstract

Electronic payment methods are increasingly prevalent worldwide, facilitating both in-person and online transactions. As credit card usage for online payments grows, fraud and payment defaults have also risen, resulting in significant financial losses. Detecting fraudulent transactions is challenging due to the highly imbalanced nature of transaction datasets, where fraudulent activities constitute only a small fraction of the data. To address this, we propose a novel hybrid feature selection framework designed to enhance the performance of machine learning models in credit card fraud detection. Our framework integrates three complementary feature selection techniques: Pearson correlation, information gain (IG), and random forest importance (RFI), each optimized for the dataset's characteristics. Pearson Correlation eliminates redundancy by removing highly correlated features, while IG and RFI evaluate the relevance of the remaining features. A union operation combines the most informative features from these methods, ensuring comprehensive and efficient feature selection. To validate the proposed approach, we test it on five diverse datasets with varying characteristics and imbalance levels, employing five state-of-the-art machine learning algorithms: Random Forest (RF), Extra Trees (ET), XGBoost (XGBC), AdaBoost, and CatBoost. We primarily propose this work for PCA-transformed datasets, but for the validation of our research, we also apply it to a real-world dataset. The results demonstrate that our methodology outperforms existing baseline approaches, achieving superior fraud detection performance across all datasets. Our findings highlight the robustness and adaptability of the proposed framework, offering a practical solution for real-world fraud detection systems. Additionally, we believe that our proposed framework can serve as a decision support system for the detection of fraudulent transactions in real-time credit cards, with the potential to make a substantial contribution to the business industry.

摘要

电子支付方式在全球范围内越来越普遍,它便利了线下和线上交易。随着信用卡用于在线支付的使用量增加,欺诈和支付违约情况也有所上升,导致了重大的财务损失。由于交易数据集具有高度不平衡的性质,其中欺诈活动只占数据的一小部分,检测欺诈性交易具有挑战性。为了解决这个问题,我们提出了一种新颖的混合特征选择框架,旨在提高机器学习模型在信用卡欺诈检测中的性能。我们的框架集成了三种互补的特征选择技术:皮尔逊相关性、信息增益(IG)和随机森林重要性(RFI),每种技术都针对数据集的特征进行了优化。皮尔逊相关性通过去除高度相关的特征来消除冗余,而IG和RFI评估其余特征的相关性。通过并集操作将这些方法中最具信息性的特征组合起来,确保全面而高效的特征选择。为了验证所提出的方法,我们在五个具有不同特征和不平衡水平的不同数据集上进行测试,采用了五种先进的机器学习算法:随机森林(RF)、极端随机树(ET)、XGBoost(XGBC)、AdaBoost和CatBoost。我们主要针对主成分分析(PCA)变换后的数据集提出这项工作,但为了验证我们的研究,我们也将其应用于一个真实世界的数据集。结果表明,我们的方法优于现有的基线方法,在所有数据集上都实现了卓越的欺诈检测性能。我们的研究结果突出了所提出框架的稳健性和适应性,为现实世界的欺诈检测系统提供了一个切实可行的解决方案。此外,我们相信我们提出的框架可以作为实时信用卡欺诈交易检测的决策支持系统,有可能为商业行业做出重大贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87f0/12266407/060b8dcdcd8e/pone.0326975.g001.jpg

相似文献

1
Hybrid feature selection framework for enhanced credit card fraud detection using machine learning models.
PLoS One. 2025 Jul 16;20(7):e0326975. doi: 10.1371/journal.pone.0326975. eCollection 2025.
2
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
3
5
Perceptions and experiences of the prevention, detection, and management of postpartum haemorrhage: a qualitative evidence synthesis.
Cochrane Database Syst Rev. 2023 Nov 27;11(11):CD013795. doi: 10.1002/14651858.CD013795.pub2.
6
Anemia diagnosis from clinical records leveraging tree-driven and XAI-enhanced machine learning.
Comput Biol Chem. 2025 Jun 30;119:108582. doi: 10.1016/j.compbiolchem.2025.108582.
7
Interventions to reduce corruption in the health sector.
Cochrane Database Syst Rev. 2016 Aug 16;2016(8):CD008856. doi: 10.1002/14651858.CD008856.pub2.
10
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.
Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

本文引用的文献

1
Optimizing hypertension prediction using ensemble learning approaches.
PLoS One. 2024 Dec 23;19(12):e0315865. doi: 10.1371/journal.pone.0315865. eCollection 2024.
2
Feature selection algorithm based on optimized genetic algorithm and the application in high-dimensional data processing.
PLoS One. 2024 May 9;19(5):e0303088. doi: 10.1371/journal.pone.0303088. eCollection 2024.
3
eXplainable Artificial Intelligence (XAI) for improving organisational regility.
PLoS One. 2024 Apr 24;19(4):e0301429. doi: 10.1371/journal.pone.0301429. eCollection 2024.
4
A systematic review of literature on credit card cyber fraud detection using machine and deep learning.
PeerJ Comput Sci. 2023 Apr 17;9:e1278. doi: 10.7717/peerj-cs.1278. eCollection 2023.
5
A Machine Learning Method with Filter-Based Feature Selection for Improved Prediction of Chronic Kidney Disease.
Bioengineering (Basel). 2022 Jul 28;9(8):350. doi: 10.3390/bioengineering9080350.
6
A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest.
Entropy (Basel). 2021 May 8;23(5):582. doi: 10.3390/e23050582.
7
Receiver operating characteristic (ROC) curve: practical review for radiologists.
Korean J Radiol. 2004 Jan-Mar;5(1):11-8. doi: 10.3348/kjr.2004.5.1.11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验