Fan Yajing, Yu Shuyang, Gu Bin, Xiong Ziran, Zhai Zhou, Huang Heng, Chang Yi
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2154-2168. doi: 10.1109/TNNLS.2024.3354978. Epub 2025 Feb 6.
Semi-supervised support vector machine (S3VM) is important because it can use plentiful unlabeled data to improve the generalization accuracy of traditional SVMs. In order to achieve good performance, it is necessary for S3VM to take some effective measures to select hyperparameters. However, model selection for semi-supervised models is still a key open problem. Existing methods for semi-supervised models to search for the optimal parameter values are usually computationally demanding, especially those ones with grid search. To address this challenging problem, in this article, we first propose solution paths of S3VM (SPS3VM), which can track the solutions of the nonconvex S3VM with respect to the hyperparameters. Specifically, we apply incremental and decremental learning methods to update the solution and let it satisfy the Karush-Kuhn-Tucker (KKT) conditions. Based on the SPS3VM and the piecewise linearity of model function, we can find the model with the minimum cross-validation (CV) error for the entire range of candidate hyperparameters by computing the error path of S3VM. Our SPS3VM is the first solution path algorithm for nonconvex optimization problem of semi-supervised learning models. We also provide the finite convergence analysis and computational complexity of SPS3VM. Experimental results on a variety of benchmark datasets not only verify that our SPS3VM can globally search the hyperparameters (regularization and ramp loss parameters) but also show a huge reduction of computational time while retaining similar or slightly better generalization performance compared with the grid search approach.
半监督支持向量机(S3VM)很重要,因为它可以利用大量未标记数据来提高传统支持向量机的泛化精度。为了实现良好的性能,S3VM有必要采取一些有效措施来选择超参数。然而,半监督模型的模型选择仍然是一个关键的开放性问题。现有的半监督模型搜索最优参数值的方法通常计算量很大,尤其是那些采用网格搜索的方法。为了解决这个具有挑战性的问题,在本文中,我们首先提出了S3VM的求解路径(SPS3VM),它可以跟踪非凸S3VM关于超参数的解。具体来说,我们应用增量和减量学习方法来更新解,并使其满足卡罗需-库恩-塔克(KKT)条件。基于SPS3VM和模型函数的分段线性,我们可以通过计算S3VM的误差路径,在候选超参数的整个范围内找到具有最小交叉验证(CV)误差的模型。我们的SPS3VM是首个用于半监督学习模型非凸优化问题的求解路径算法。我们还给出了SPS3VM的有限收敛性分析和计算复杂度。在各种基准数据集上的实验结果不仅验证了我们的SPS3VM可以全局搜索超参数(正则化和斜坡损失参数),而且表明与网格搜索方法相比,在保持相似或略好的泛化性能的同时,计算时间大幅减少。