Suppr超能文献

用于多类不平衡非平稳数据流的智能自适应集成模型。

Smart adaptive ensemble model for multiclass imbalanced nonstationary data streams.

作者信息

Palli Abdul Sattar, Jaafar Jafreezal, Md Saad Mohamad Hanif, Mokhtar Ainul Akmar, Gomes Heitor Murilo, Soomro Afzal Ahmed, Gilal Abdul Rehman

机构信息

Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, 32610, Seri Iskandar, Perak Darul Ridzuan, Malaysia.

Anti-Narcotics Force, Ministry of Interior and Narcotics Control, Islamabad, 46000, Pakistan.

出版信息

Sci Rep. 2025 Jul 1;15(1):21140. doi: 10.1038/s41598-025-05122-w.

Abstract

In real-time streaming data, concept drift and class imbalance may occur simultaneously which causes the performance degradation of the online machine learning models. Most of the existing work is limited to addressing these issues for binary class data streams. Very little focus is given to the multi-class data streams. The most common approach to address these issues is ensemble learning. Ensemble learning consists of multiple classifiers combined which are trained on different subsets of the data to improve the overall accuracy. The performance of the ensemble learning approach suffers in case the new classifier is not trained on appropriate data (the data about the new concept). To address this gap, this study has proposed a Smart Adaptive Ensemble Model (SAEM) to address the issues of concept drift and class imbalance for multi-class data streams. The SAEM monitors the feature-level change in data distribution and creates a background ensemble to train the new classifier on features that observe change. To address the class imbalance issue, SAEM applies higher weights on the minority class instances using the dynamic class imbalance ratio. The proposed model outperformed the existing state-of-the-art approaches on the eight different data streams. The results showed an average improvement of 15.857% in accuracy, 20.35% in Kappa, 16.12% in F1-score, 15.58% in precision, and 16.42% in recall. The Friedman test confirmed statistically significant performance differences among all models across five key metrics. Based on the obtained results, the research findings strongly support the notion that SAEM exhibits enhanced effectiveness and efficiency as a solution for online learning applications.

摘要

在实时流数据中,概念漂移和类不平衡可能同时发生,这会导致在线机器学习模型的性能下降。现有的大多数工作都局限于解决二元类数据流的这些问题。很少关注多类数据流。解决这些问题最常见的方法是集成学习。集成学习由多个组合的分类器组成,这些分类器在数据的不同子集上进行训练,以提高整体准确性。如果新的分类器没有在适当的数据(关于新概念的数据)上进行训练,集成学习方法的性能就会受到影响。为了弥补这一差距,本研究提出了一种智能自适应集成模型(SAEM),以解决多类数据流的概念漂移和类不平衡问题。SAEM监测数据分布中的特征级变化,并创建一个背景集成,以便在观察到变化的特征上训练新的分类器。为了解决类不平衡问题,SAEM使用动态类不平衡率对少数类实例应用更高的权重。所提出的模型在八个不同的数据流上优于现有的最先进方法。结果显示,在准确率方面平均提高了15.857%,在卡帕值方面提高了20.35%,在F1分数方面提高了16.12%,在精确率方面提高了15.58%,在召回率方面提高了16.42%。弗里德曼检验证实了所有模型在五个关键指标上的性能差异具有统计学意义。基于获得的结果,研究结果有力地支持了SAEM作为在线学习应用解决方案具有更高有效性和效率的观点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bfc/12215259/d87823e64aa8/41598_2025_5122_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验