CIICESI, Escola Superior de Tecnologia e Gest ao, Politécnico do Porto, Portugal.
ALGORITMI Research Centre/LASI, University of Minho, Braga, Portugal.
Int J Neural Syst. 2023 Mar;33(3):2350011. doi: 10.1142/S0129065723500119. Epub 2023 Feb 1.
In the last years, the number of machine learning algorithms and their parameters has increased significantly. On the one hand, this increases the chances of finding better models. On the other hand, it increases the complexity of the task of training a model, as the search space expands significantly. As the size of datasets also grows, traditional approaches based on extensive search start to become prohibitively expensive in terms of computational resources and time, especially in data streaming scenarios. This paper describes an approach based on meta-learning that tackles two main challenges. The first is to predict key performance indicators of machine learning models. The second is to recommend the best algorithm/configuration for training a model for a given machine learning problem. When compared to a state-of-the-art method (AutoML), the proposed approach is up to 130x faster and only 4% worse in terms of average model quality. Hence, it is especially suited for scenarios in which models need to be updated regularly, such as in streaming scenarios with big data, in which some accuracy can be traded for a much shorter model training time.
在过去的几年中,机器学习算法及其参数的数量显著增加。一方面,这增加了找到更好模型的机会。另一方面,由于搜索空间大大扩展,训练模型的任务的复杂性也增加了。随着数据集的规模也在不断增长,基于广泛搜索的传统方法在计算资源和时间方面开始变得过于昂贵,特别是在数据流场景中。本文描述了一种基于元学习的方法,该方法解决了两个主要挑战。第一个挑战是预测机器学习模型的关键性能指标。第二个挑战是为给定的机器学习问题推荐训练模型的最佳算法/配置。与最先进的方法(AutoML)相比,所提出的方法快 130 倍,在平均模型质量方面仅差 4%。因此,它特别适合需要定期更新模型的场景,例如在具有大数据的流场景中,为了更短的模型训练时间,可以牺牲一些准确性。