Elwell Ryan, Polikar Robi
Signal Processing & Pattern Recognition Laboratory, Electrical & Computer Engineering Department, Rowan University, Glassboro, NJ 08028, USA.
IEEE Trans Neural Netw. 2011 Oct;22(10):1517-31. doi: 10.1109/TNN.2011.2160459. Epub 2011 Aug 4.
We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn(++). NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn(++) family of algorithms, that is, without requiring access to previously seen data. Learn(++). NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifier's time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn(++). NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper.
我们介绍了一种基于分类器集成的方法,用于概念漂移的增量学习,其特点是处于非平稳环境(NSE)中,即基础数据分布随时间变化。所提出的算法名为Learn(++)。NSE,它从连续的数据批次中学习,而不对漂移的性质或速率做任何假设;它可以从经历恒定或可变漂移速率、概念类的添加或删除以及周期性漂移的此类环境中学习。该算法与Learn(++)算法家族的其他成员一样进行增量学习,也就是说,无需访问先前见过的数据。Learn(++)。NSE为其接收到的每一批数据训练一个新的分类器,并使用动态加权多数投票来组合这些分类器。该方法的新颖之处在于根据每个分类器在当前和过去环境中的时间调整精度来确定投票权重。这种方法使算法能够识别基础数据分布的变化以及早期分布的可能重现,并据此采取行动。我们在几个旨在模拟各种非平稳环境的合成数据集以及一个真实世界的天气预报数据集上评估了该算法。还包括与其他几种方法的比较。结果表明,Learn(++)。NSE能够非常紧密地跟踪变化的环境,无论概念漂移的类型如何。为了便于感兴趣的研究人员未来使用、比较和基准测试,我们还发布了本文中使用的数据。