Amaya-Tejera Nazhir, Gamarra Margarita, Vélez Jorge I, Zurek Eduardo
Department of Computer Science, Universidad del Norte, Barranquilla, Colombia.
Department of Industrial Engineering, Universidad del Norte, Barranquilla, Colombia.
Front Artif Intell. 2024 Feb 26;7:1287875. doi: 10.3389/frai.2024.1287875. eCollection 2024.
Support Vector Machines (SVMs) are a type of supervised machine learning algorithm widely used for classification tasks. In contrast to traditional methods that split the data into separate training and testing sets, here we propose an innovative approach where subsets of the original data are randomly selected to train the model multiple times. This iterative training process aims to identify a representative data subset, leading to improved inferences about the population. Additionally, we introduce a novel distance-based kernel specifically designed for binary-type features based on a similarity matrix that efficiently handles both binary and multi-class classification problems. Computational experiments on publicly available datasets of varying sizes demonstrate that our proposed method significantly outperforms existing approaches in terms of classification accuracy. Furthermore, the distance-based kernel achieves superior performance compared to other well-known kernels from the literature and those used in previous studies on the same datasets. These findings validate the effectiveness of our proposed classification method and distance-based kernel for SVMs. By leveraging random subset selection and a unique kernel design, we achieve notable improvements in classification accuracy. These results have significant implications for diverse classification problems in Machine Learning and data analysis.
支持向量机(SVM)是一种广泛应用于分类任务的监督式机器学习算法。与将数据拆分为单独的训练集和测试集的传统方法不同,我们在此提出一种创新方法,即从原始数据子集中随机选择子集多次训练模型。这种迭代训练过程旨在识别具有代表性的数据子集,从而改进对总体的推断。此外,我们基于相似性矩阵引入了一种专门为二元类型特征设计的新型基于距离的核,该核能够有效处理二元和多类分类问题。对不同大小的公开可用数据集进行的计算实验表明,我们提出的方法在分类准确率方面显著优于现有方法。此外,与文献中其他知名核以及之前在相同数据集上的研究中使用的核相比,基于距离的核具有更优的性能。这些发现验证了我们为支持向量机提出的分类方法和基于距离的核的有效性。通过利用随机子集选择和独特的核设计,我们在分类准确率方面取得了显著提高。这些结果对机器学习和数据分析中的各种分类问题具有重要意义。