Shan Jicheng, Zhang Hang, Liu Weike, Liu Qingbao
IEEE Trans Neural Netw Learn Syst. 2019 Feb;30(2):486-498. doi: 10.1109/TNNLS.2018.2844332. Epub 2018 Jul 2.
In practical applications, data stream classification faces significant challenges, such as high cost of labeling instances and potential concept drifting. We present a new online active learning ensemble framework for drifting data streams based on a hybrid labeling strategy that includes the following: 1) an ensemble classifier, which consists of a long-term stable classifier and multiple dynamic classifiers (a multilevel sliding window model is used to create and update the dynamic classifiers to effectively process both the gradual drift type and sudden drift type data stream) and 2) active learning, which takes a nonfixed labeling budget, supports on-demand request labeling, and adopts an uncertainty strategy and random strategy to label instances. The decision threshold of the uncertainty strategy is adjusted dynamically, i.e., when concept drift occurs, the threshold is gradually reduced to query the most uncertain instances in priority to reduce the request expense as much as possible. Experiments on synthetic and real data sets show that precise prediction accuracy can be obtained by the proposed method without increasing the total cost of labeling, and that the labeling cost can be dynamically allocated according to the concept drift.
在实际应用中,数据流分类面临着重大挑战,例如标记实例的成本高昂以及潜在的概念漂移。我们基于一种混合标记策略,提出了一种用于漂移数据流的新型在线主动学习集成框架,该框架包括以下内容:1)一个集成分类器,它由一个长期稳定的分类器和多个动态分类器组成(使用多级滑动窗口模型来创建和更新动态分类器,以有效处理渐变漂移类型和突发漂移类型的数据流);2)主动学习,它采用非固定的标记预算,支持按需请求标记,并采用不确定性策略和随机策略来标记实例。不确定性策略的决策阈值会动态调整,即当概念漂移发生时,阈值会逐渐降低,以便优先查询最不确定的实例,从而尽可能降低请求成本。在合成数据集和真实数据集上的实验表明,所提出的方法能够在不增加总标记成本的情况下获得精确的预测准确率,并且可以根据概念漂移动态分配标记成本。