Moosaei Hossein, Ganaie M A, Hladík Milan, Tanveer M
Department of Informatics, Faculty of Science, Jan Evangelista Purkyně University, Ústí nad Labem, Czech Republic; Department of Applied Mathematics, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic.
Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, 453552, India; Department of Robotics, University of Michigan, Ann Arbor, MI, 48109, USA.
Neural Netw. 2023 Jan;157:125-135. doi: 10.1016/j.neunet.2022.10.003. Epub 2022 Oct 15.
Imbalanced datasets are prominent in real-world problems. In such problems, the data samples in one class are significantly higher than in the other classes, even though the other classes might be more important. The standard classification algorithms may classify all the data into the majority class, and this is a significant drawback of most standard learning algorithms, so imbalanced datasets need to be handled carefully. One of the traditional algorithms, twin support vector machines (TSVM), performed well on balanced data classification but poorly on imbalanced datasets classification. In order to improve the TSVM algorithm's classification ability for imbalanced datasets, recently, driven by the universum twin support vector machine (UTSVM), a reduced universum twin support vector machine for class imbalance learning (RUTSVM) was proposed. The dual problem and finding classifiers involve matrix inverse computation, which is one of RUTSVM's key drawbacks. In this paper, we improve the RUTSVM and propose an improved reduced universum twin support vector machine for class imbalance learning (IRUTSVM). We offer alternative Lagrangian functions to tackle the primal problems of RUTSVM in the suggested IRUTSVM approach by inserting one of the terms in the objective function into the constraints. As a result, we obtain new dual formulation for each optimization problem so that we need not compute inverse matrices neither in the training process nor in finding the classifiers. Moreover, the smaller size of the rectangular kernel matrices is used to reduce the computational time. Extensive testing is carried out on a variety of synthetic and real-world imbalanced datasets, and the findings show that the IRUTSVM algorithm outperforms the TSVM, UTSVM, and RUTSVM algorithms in terms of generalization performance.
不平衡数据集在现实世界问题中很突出。在这类问题中,一个类别的数据样本显著高于其他类别,尽管其他类别可能更重要。标准分类算法可能会将所有数据分类到多数类中,这是大多数标准学习算法的一个重大缺陷,因此需要谨慎处理不平衡数据集。传统算法之一的孪生支持向量机(TSVM)在平衡数据分类上表现良好,但在不平衡数据集分类上表现不佳。为了提高TSVM算法对不平衡数据集的分类能力,最近,在全域孪生支持向量机(UTSVM)的推动下,提出了一种用于类别不平衡学习的约简全域孪生支持向量机(RUTSVM)。对偶问题和寻找分类器涉及矩阵求逆计算,这是RUTSVM的关键缺陷之一。在本文中,我们改进了RUTSVM,并提出了一种改进的用于类别不平衡学习的约简全域孪生支持向量机(IRUTSVM)。我们通过将目标函数中的一项插入约束中来提供替代拉格朗日函数,以解决所提出的IRUTSVM方法中RUTSVM的原始问题。结果,我们为每个优化问题获得了新的对偶形式,这样在训练过程和寻找分类器时都无需计算逆矩阵。此外,使用较小尺寸的矩形核矩阵来减少计算时间。我们在各种合成和真实世界的不平衡数据集上进行了广泛测试,结果表明IRUTSVM算法在泛化性能方面优于TSVM、UTSVM和RUTSVM算法。