Suppr超能文献

用于漂移数据流的在线主动学习集成框架

Online Active Learning Ensemble Framework for Drifted Data Streams.

作者信息

Shan Jicheng, Zhang Hang, Liu Weike, Liu Qingbao

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Feb;30(2):486-498. doi: 10.1109/TNNLS.2018.2844332. Epub 2018 Jul 2.

Abstract

In practical applications, data stream classification faces significant challenges, such as high cost of labeling instances and potential concept drifting. We present a new online active learning ensemble framework for drifting data streams based on a hybrid labeling strategy that includes the following: 1) an ensemble classifier, which consists of a long-term stable classifier and multiple dynamic classifiers (a multilevel sliding window model is used to create and update the dynamic classifiers to effectively process both the gradual drift type and sudden drift type data stream) and 2) active learning, which takes a nonfixed labeling budget, supports on-demand request labeling, and adopts an uncertainty strategy and random strategy to label instances. The decision threshold of the uncertainty strategy is adjusted dynamically, i.e., when concept drift occurs, the threshold is gradually reduced to query the most uncertain instances in priority to reduce the request expense as much as possible. Experiments on synthetic and real data sets show that precise prediction accuracy can be obtained by the proposed method without increasing the total cost of labeling, and that the labeling cost can be dynamically allocated according to the concept drift.

摘要

在实际应用中,数据流分类面临着重大挑战,例如标记实例的成本高昂以及潜在的概念漂移。我们基于一种混合标记策略,提出了一种用于漂移数据流的新型在线主动学习集成框架,该框架包括以下内容:1)一个集成分类器,它由一个长期稳定的分类器和多个动态分类器组成(使用多级滑动窗口模型来创建和更新动态分类器,以有效处理渐变漂移类型和突发漂移类型的数据流);2)主动学习,它采用非固定的标记预算,支持按需请求标记,并采用不确定性策略和随机策略来标记实例。不确定性策略的决策阈值会动态调整,即当概念漂移发生时,阈值会逐渐降低,以便优先查询最不确定的实例,从而尽可能降低请求成本。在合成数据集和真实数据集上的实验表明,所提出的方法能够在不增加总标记成本的情况下获得精确的预测准确率,并且可以根据概念漂移动态分配标记成本。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验