Liu Anjin, Lu Jie, Zhang Guangquan
IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):293-307. doi: 10.1109/TNNLS.2020.2978523. Epub 2021 Jan 4.
Concept drift refers to changes in the distribution of underlying data and is an inherent property of evolving data streams. Ensemble learning, with dynamic classifiers, has proved to be an efficient method of handling concept drift. However, the best way to create and maintain ensemble diversity with evolving streams is still a challenging problem. In contrast to estimating diversity via inputs, outputs, or classifier parameters, we propose a diversity measurement based on whether the ensemble members agree on the probability of a regional distribution change. In our method, estimations over regional distribution changes are used as instance weights. Constructing different region sets through different schemes will lead to different drift estimation results, thereby creating diversity. The classifiers that disagree the most are selected to maximize diversity. Accordingly, an instance-based ensemble learning algorithm, called the diverse instance-weighting ensemble (DiwE), is developed to address concept drift for data stream classification problems. Evaluations of various synthetic and real-world data stream benchmarks show the effectiveness and advantages of the proposed algorithm.
概念漂移是指基础数据分布的变化,是不断演变的数据流的固有属性。具有动态分类器的集成学习已被证明是处理概念漂移的一种有效方法。然而,随着数据流的不断演变,创建和维持集成多样性的最佳方法仍然是一个具有挑战性的问题。与通过输入、输出或分类器参数估计多样性不同,我们提出了一种基于集成成员是否就区域分布变化的概率达成一致的多样性度量方法。在我们的方法中,对区域分布变化的估计用作实例权重。通过不同方案构建不同的区域集将导致不同的漂移估计结果,从而创造多样性。选择分歧最大的分类器以最大化多样性。因此,开发了一种基于实例的集成学习算法,称为多样实例加权集成(DiwE),以解决数据流分类问题中的概念漂移。对各种合成和真实世界数据流基准的评估表明了所提算法的有效性和优势。