IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):3055-3066. doi: 10.1109/TPAMI.2021.3056950. Epub 2021 Aug 4.
Automated machine learning (AutoML) seeks to automatically find so-called machine learning pipelines that maximize the prediction performance when being used to train a model on a given dataset. One of the main and yet open challenges in AutoMLis an effective use of computational resources: An AutoML process involves the evaluation of many candidate pipelines, which are costly but often ineffective because they are canceled due to a timeout. In this paper, we present an approach to predict the runtime of two-step machine learning pipelines with up to one pre-processor, which can be used to anticipate whether or not a pipeline will time out. Separate runtime models are trained offline for each algorithm that may be used in a pipeline, and an overall prediction is derived from these models. We empirically show that the approach increases successful evaluations made by an AutoML tool while preserving or even improving on the previously best solutions.
自动化机器学习(AutoML)旨在自动找到所谓的机器学习管道,当用于在给定数据集上训练模型时,这些管道可以最大限度地提高预测性能。AutoML 中的一个主要且尚未解决的挑战是有效地利用计算资源:AutoML 过程涉及对许多候选管道的评估,这些评估代价高昂,但往往效果不佳,因为它们由于超时而被取消。在本文中,我们提出了一种方法来预测具有至多一个预处理步骤的两步式机器学习管道的运行时,这可用于预测管道是否会超时。为可能在管道中使用的每个算法分别离线训练运行时模型,并从这些模型中得出总体预测。我们通过经验证明,该方法在保留甚至改进之前的最佳解决方案的同时,增加了 AutoML 工具的成功评估次数。