Wever Marcel, Tornede Alexander, Mohr Felix, Hullermeier Eyke
IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):3037-3054. doi: 10.1109/TPAMI.2021.3051276. Epub 2021 Aug 4.
Automated machine learning (AutoML) supports the algorithmic construction and data-specific customization of machine learning pipelines, including the selection, combination, and parametrization of machine learning algorithms as main constituents. Generally speaking, AutoML approaches comprise two major components: a search space model and an optimizer for traversing the space. Recent approaches have shown impressive results in the realm of supervised learning, most notably (single-label) classification (SLC). Moreover, first attempts at extending these approaches towards multi-label classification (MLC) have been made. While the space of candidate pipelines is already huge in SLC, the complexity of the search space is raised to an even higher power in MLC. One may wonder, therefore, whether and to what extent optimizers established for SLC can scale to this increased complexity, and how they compare to each other. This paper makes the following contributions: First, we survey existing approaches to AutoML for MLC. Second, we augment these approaches with optimizers not previously tried for MLC. Third, we propose a benchmarking framework that supports a fair and systematic comparison. Fourth, we conduct an extensive experimental study, evaluating the methods on a suite of MLC problems. We find a grammar-based best-first search to compare favorably to other optimizers.
自动化机器学习(AutoML)支持机器学习管道的算法构建和针对特定数据的定制,包括作为主要组成部分的机器学习算法的选择、组合和参数化。一般来说,AutoML方法包括两个主要组件:一个搜索空间模型和一个用于遍历该空间的优化器。最近的方法在监督学习领域,尤其是(单标签)分类(SLC)方面取得了令人瞩目的成果。此外,已经有人首次尝试将这些方法扩展到多标签分类(MLC)。虽然在SLC中候选管道的空间已经很大,但在MLC中搜索空间的复杂性又提升到了更高的程度。因此,人们可能会问,为SLC建立的优化器是否能够以及在多大程度上扩展到这种增加的复杂性,以及它们相互之间的比较情况如何。本文做出了以下贡献:第一,我们调查了现有的用于MLC的AutoML方法。第二,我们用以前未尝试用于MLC的优化器增强这些方法。第三,我们提出了一个支持公平和系统比较的基准框架。第四,我们进行了广泛的实验研究,在一系列MLC问题上评估这些方法。我们发现基于语法的最佳优先搜索比其他优化器表现更优。