IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):3067-3078. doi: 10.1109/TPAMI.2021.3062900. Epub 2021 Aug 4.
Automated Machine Learning (AutoML) systems have been shown to efficiently build good models for new datasets. However, it is often not clear how well they can adapt when the data evolves over time. The main goal of this study is to understand the effect of concept drift on the performance of AutoML methods, and which adaptation strategies can be employed to make them more robust to changes in the underlying data. To that end, we propose 6 concept drift adaptation strategies and evaluate their effectiveness on a variety of AutoML approaches for building machine learning pipelines, including Bayesian optimization, genetic programming, and random search with automated stacking. These are evaluated empirically on real-world and synthetic data streams with different types of concept drift. Based on this analysis, we propose ways to develop more sophisticated and robust AutoML techniques.
自动化机器学习 (AutoML) 系统已被证明可以有效地为新数据集构建良好的模型。然而,当数据随时间演变时,它们的适应能力如何通常并不清楚。本研究的主要目标是了解概念漂移对 AutoML 方法性能的影响,以及可以采用哪些适应策略使它们更能适应基础数据的变化。为此,我们提出了 6 种概念漂移适应策略,并在构建机器学习管道的各种 AutoML 方法上评估了它们的有效性,包括贝叶斯优化、遗传编程和自动化堆叠的随机搜索。这些方法在具有不同类型概念漂移的真实和合成数据流上进行了实证评估。基于此分析,我们提出了开发更复杂和稳健的 AutoML 技术的方法。