用于多类不平衡非平稳数据流的智能自适应集成模型。

Smart adaptive ensemble model for multiclass imbalanced nonstationary data streams.

作者信息

Palli Abdul Sattar, Jaafar Jafreezal, Md Saad Mohamad Hanif, Mokhtar Ainul Akmar, Gomes Heitor Murilo, Soomro Afzal Ahmed, Gilal Abdul Rehman

机构信息

Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, 32610, Seri Iskandar, Perak Darul Ridzuan, Malaysia.

Anti-Narcotics Force, Ministry of Interior and Narcotics Control, Islamabad, 46000, Pakistan.

出版信息

Sci Rep. 2025 Jul 1;15(1):21140. doi: 10.1038/s41598-025-05122-w.

DOI:10.1038/s41598-025-05122-w

PMID:40595856

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12215259/

Abstract

In real-time streaming data, concept drift and class imbalance may occur simultaneously which causes the performance degradation of the online machine learning models. Most of the existing work is limited to addressing these issues for binary class data streams. Very little focus is given to the multi-class data streams. The most common approach to address these issues is ensemble learning. Ensemble learning consists of multiple classifiers combined which are trained on different subsets of the data to improve the overall accuracy. The performance of the ensemble learning approach suffers in case the new classifier is not trained on appropriate data (the data about the new concept). To address this gap, this study has proposed a Smart Adaptive Ensemble Model (SAEM) to address the issues of concept drift and class imbalance for multi-class data streams. The SAEM monitors the feature-level change in data distribution and creates a background ensemble to train the new classifier on features that observe change. To address the class imbalance issue, SAEM applies higher weights on the minority class instances using the dynamic class imbalance ratio. The proposed model outperformed the existing state-of-the-art approaches on the eight different data streams. The results showed an average improvement of 15.857% in accuracy, 20.35% in Kappa, 16.12% in F1-score, 15.58% in precision, and 16.42% in recall. The Friedman test confirmed statistically significant performance differences among all models across five key metrics. Based on the obtained results, the research findings strongly support the notion that SAEM exhibits enhanced effectiveness and efficiency as a solution for online learning applications.

摘要

在实时流数据中，概念漂移和类不平衡可能同时发生，这会导致在线机器学习模型的性能下降。现有的大多数工作都局限于解决二元类数据流的这些问题。很少关注多类数据流。解决这些问题最常见的方法是集成学习。集成学习由多个组合的分类器组成，这些分类器在数据的不同子集上进行训练，以提高整体准确性。如果新的分类器没有在适当的数据（关于新概念的数据）上进行训练，集成学习方法的性能就会受到影响。为了弥补这一差距，本研究提出了一种智能自适应集成模型（SAEM），以解决多类数据流的概念漂移和类不平衡问题。SAEM监测数据分布中的特征级变化，并创建一个背景集成，以便在观察到变化的特征上训练新的分类器。为了解决类不平衡问题，SAEM使用动态类不平衡率对少数类实例应用更高的权重。所提出的模型在八个不同的数据流上优于现有的最先进方法。结果显示，在准确率方面平均提高了15.857%，在卡帕值方面提高了20.35%，在F1分数方面提高了16.12%，在精确率方面提高了15.58%，在召回率方面提高了16.42%。弗里德曼检验证实了所有模型在五个关键指标上的性能差异具有统计学意义。基于获得的结果，研究结果有力地支持了SAEM作为在线学习应用解决方案具有更高有效性和效率的观点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bfc/12215259/d87823e64aa8/41598_2025_5122_Fig1_HTML.jpg

相似文献

Smart adaptive ensemble model for multiclass imbalanced nonstationary data streams.

Sci Rep. 2025 Jul 1;15(1):21140. doi: 10.1038/s41598-025-05122-w.

Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.

Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

Individual-level interventions for reducing occupational stress in healthcare workers.

Cochrane Database Syst Rev. 2023 May 12;5(5):CD002892. doi: 10.1002/14651858.CD002892.pub6.

The clinical effectiveness and cost-effectiveness of enzyme replacement therapy for Gaucher's disease: a systematic review.

Health Technol Assess. 2006 Jul;10(24):iii-iv, ix-136. doi: 10.3310/hta10240.

Stacked Ensemble Learning for Classification of Parkinson's Disease Using Telemonitoring Vocal Features.

Diagnostics (Basel). 2025 Jun 9;15(12):1467. doi: 10.3390/diagnostics15121467.

The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.

Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.

本文引用的文献

GQEO: Nearest neighbor graph-based generalized quadrilateral element oversampling for class-imbalance problem.

Neural Netw. 2025 Apr;184:107107. doi: 10.1016/j.neunet.2024.107107. Epub 2024 Dec 27.

Applications of Artificial Intelligence and Big Data Analytics in m-Health: A Healthcare System Perspective.

J Healthc Eng. 2020 Aug 30;2020:8894694. doi: 10.1155/2020/8894694. eCollection 2020.

Adaptive Chunk-Based Dynamic Weighted Majority for Imbalanced Data Streams With Concept Drift.

IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2764-2778. doi: 10.1109/TNNLS.2019.2951814. Epub 2019 Dec 5.

A Systematic Study of Online Class Imbalance Learning With Concept Drift.

IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):4802-4821. doi: 10.1109/TNNLS.2017.2771290. Epub 2018 Jan 4.

Adaptive Online Sequential ELM for Concept Drift Tackling.

Comput Intell Neurosci. 2016;2016:8091267. doi: 10.1155/2016/8091267. Epub 2016 Aug 9.

Reacting to different types of concept drift: the Accuracy Updated Ensemble algorithm.

IEEE Trans Neural Netw Learn Syst. 2014 Jan;25(1):81-94. doi: 10.1109/TNNLS.2013.2251352.

Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance.

Neural Netw. 2008 Mar-Apr;21(2-3):427-36. doi: 10.1016/j.neunet.2007.12.031. Epub 2007 Dec 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于多类不平衡非平稳数据流的智能自适应集成模型。

Smart adaptive ensemble model for multiclass imbalanced nonstationary data streams.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献