Quadir A, Sajid M, Tanveer M
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12444-12453. doi: 10.1109/TNNLS.2024.3476391.
Twin support vector machine (TSVM) is an emerging machine learning model with versatile applicability in classification and regression endeavors. Nevertheless, TSVM confronts noteworthy challenges: 1) the imperative demand for matrix inversions presents formidable obstacles to its efficiency and applicability on large-scale datasets; 2) the omission of the structural risk minimization (SRM) principle in its primal formulation heightens the vulnerability to overfitting risks; and 3) the TSVM exhibits a high susceptibility to noise and outliers and also demonstrates instability when subjected to resampling. In view of the aforementioned challenges, we propose the granular ball TSVM (GBTSVM). GBTSVM takes granular balls (GBs), rather than individual data points, as inputs to construct a classifier. These GBs, characterized by their coarser granularity, exhibit robustness to resampling and reduced susceptibility to the impact of noise and outliers. We further propose a novel large-scale GBTSVM (LS-GBTSVM). LS-GBTSVM's optimization formulation ensures two critical facets: 1) it eliminates the need for matrix inversions, streamlining the LS-GBTSVM's computational efficiency; and 2) it incorporates the SRM principle through the incorporation of regularization terms, effectively addressing the issue of overfitting. The proposed LS-GBTSVM exemplifies efficiency, scalability for large datasets, and robustness against noise and outliers. We conduct a comprehensive evaluation of the GBTSVM and LS-GBTSVM models on benchmark datasets from UCI and KEEL, both with and without the addition of label noise, and compared with existing baseline models. Furthermore, we extend our assessment to the large-scale NDC datasets to establish the practicality of the proposed models in such contexts. Our experimental findings and rigorous statistical analyses affirm the superior generalization prowess of the proposed GBTSVM and LS-GBTSVM models compared to the baseline models. The source code of the proposed GBTSVM and LS-GBTSVM models are available at https://github.com/mtanveer1/GBTSVM.
孪生支持向量机(TSVM)是一种新兴的机器学习模型,在分类和回归任务中具有广泛的适用性。然而,TSVM面临着显著的挑战:1)对矩阵求逆的迫切需求对其在大规模数据集上的效率和适用性构成了巨大障碍;2)其原始公式中忽略了结构风险最小化(SRM)原则,增加了过拟合风险的脆弱性;3)TSVM对噪声和异常值高度敏感,并且在进行重采样时也表现出不稳定性。鉴于上述挑战,我们提出了粒度球TSVM(GBTSVM)。GBTSVM以粒度球(GB)而不是单个数据点作为输入来构建分类器。这些GB具有较粗的粒度,对重采样具有鲁棒性,并且对噪声和异常值的影响敏感性降低。我们进一步提出了一种新颖的大规模GBTSVM(LS - GBTSVM)。LS - GBTSVM的优化公式确保了两个关键方面:1)它消除了对矩阵求逆的需求,简化了LS - GBTSVM的计算效率;2)它通过纳入正则化项纳入了SRM原则,有效解决了过拟合问题。所提出的LS - GBTSVM体现了效率、对大型数据集的可扩展性以及对噪声和异常值的鲁棒性。我们在来自UCI和KEEL的基准数据集上对GBTSVM和LS - GBTSVM模型进行了全面评估,包括添加和不添加标签噪声的情况,并与现有的基线模型进行了比较。此外,我们将评估扩展到大规模的NDC数据集,以确定所提出模型在此类情况下的实用性。我们的实验结果和严格的统计分析证实,与基线模型相比,所提出的GBTSVM和LS - GBTSVM模型具有卓越的泛化能力。所提出的GBTSVM和LS - GBTSVM模型的源代码可在https://github.com/mtanveer1/GBTSVM获取。