Deng Liping, Chen Wen-Sheng, Xiao Mingqing
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12540-12552. doi: 10.1109/TNNLS.2023.3263506. Epub 2024 Sep 3.
The performance of classification algorithms is mainly governed by the hyperparameter settings deployed in applications, and the search for desirable hyperparameter configurations usually is quite challenging due to the complexity of datasets. Metafeatures are a group of measures that characterize the underlying dataset from various aspects, and the corresponding recommendation algorithm fully relies on the appropriate selection of metafeatures. Metalearning (MtL), aiming to improve the learning algorithm itself, requires development in integrating features, models, and algorithm learning to accomplish its goal. In this article, we develop a multivariate sparse-group Lasso (SGLasso) model embedded with MtL capacity in recommending suitable configurations via learning. The main idea is to select the principal metafeatures by removing those redundant or irregular ones, promoting both efficiency and performance in the hyperparameter configuration recommendation. To be specific, we first extract the metafeatures and classification performance of a set of configurations from the collection of historical datasets, and then, a metaregression task is established through SGLasso to capture the main characteristics of the underlying relationship between metafeatures and historical performance. For a new dataset, the classification performance of configurations can be estimated through the selected metafeatures so that the configuration with the highest predictive performance in terms of the new dataset can be generated. Furthermore, a general MtL architecture combined with our model is developed. Extensive experiments are conducted on 136 UCI datasets, demonstrating the effectiveness of the proposed approach. The empirical results on the well-known SVM show that our model can effectively recommend suitable configurations and outperform the existing MtL-based methods and the well-known search-based algorithms, such as random search, Bayesian optimization, and Hyperband.
分类算法的性能主要由应用中部署的超参数设置决定,由于数据集的复杂性,寻找理想的超参数配置通常具有很大挑战性。元特征是从各个方面表征基础数据集的一组度量,相应的推荐算法完全依赖于元特征的适当选择。元学习(MtL)旨在改进学习算法本身,需要在特征整合、模型和算法学习的整合方面取得进展以实现其目标。在本文中,我们开发了一种嵌入元学习能力的多元稀疏组套索(SGLasso)模型,通过学习来推荐合适的配置。主要思想是通过去除那些冗余或不规则的元特征来选择主要元特征,提高超参数配置推荐的效率和性能。具体而言,我们首先从历史数据集集合中提取一组配置的元特征和分类性能,然后通过SGLasso建立一个元回归任务,以捕捉元特征与历史性能之间潜在关系的主要特征。对于新数据集,可以通过所选元特征估计配置的分类性能,从而生成在新数据集方面具有最高预测性能的配置。此外,还开发了一种结合我们模型的通用元学习架构。在136个UCI数据集上进行了广泛实验,证明了所提方法的有效性。在著名的支持向量机上的实证结果表明,我们的模型可以有效地推荐合适的配置,并且优于现有的基于元学习的方法以及著名的基于搜索的算法,如随机搜索、贝叶斯优化和超参数搜索。