Suppr超能文献

一种快速训练支持向量机的简单可靠实例选择方法:有效边界识别。

A simple and reliable instance selection for fast training support vector machine: Valid Border Recognition.

机构信息

School of Artificial Intelligence, Nanjing University of Information Science & Technology, Nanjing, 210044, China; Research Institute of Talent Big Data, Nanjing University of Information Science & Technology, Nanjing, 210044, China.

Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100190, China.

出版信息

Neural Netw. 2023 Sep;166:379-395. doi: 10.1016/j.neunet.2023.07.018. Epub 2023 Jul 17.

Abstract

Support vector machines (SVMs) are powerful statistical learning tools, but their application to large datasets can cause time-consuming training complexity. To address this issue, various instance selection (IS) approaches have been proposed, which choose a small fraction of critical instances and screen out others before training. However, existing methods have not been able to balance accuracy and efficiency well. Some methods miss critical instances, while others use complicated selection schemes that require even more execution time than training with all original instances, thus violating the initial intention of IS. In this work, we present a newly developed IS method called Valid Border Recognition (VBR). VBR selects the closest heterogeneous neighbors as valid border instances and incorporates this process into the creation of a reduced Gaussian kernel matrix, thus minimizing the execution time. To improve reliability, we propose a strengthened version of VBR (SVBR). Based on VBR, SVBR gradually adds farther heterogeneous neighbors as complements until the Lagrange multipliers of already selected instances become stable. In numerical experiments, the effectiveness of our proposed methods is verified on benchmark and synthetic datasets in terms of accuracy, execution time and inference time.

摘要

支持向量机(SVMs)是强大的统计学习工具,但将其应用于大型数据集可能会导致耗时的训练复杂性。为了解决这个问题,已经提出了各种实例选择(IS)方法,这些方法选择一小部分关键实例,并在训练之前筛选出其他实例。然而,现有的方法还没有能够很好地平衡准确性和效率。一些方法错过了关键实例,而其他方法则使用复杂的选择方案,这些方案甚至比使用所有原始实例进行训练需要更长的执行时间,从而违反了 IS 的初衷。在这项工作中,我们提出了一种名为有效边界识别(VBR)的新的实例选择方法。VBR 选择最近的异类邻居作为有效边界实例,并将此过程纳入到减少的高斯核矩阵的创建中,从而最小化执行时间。为了提高可靠性,我们提出了 VBR 的强化版本(SVBR)。基于 VBR,SVBR 逐渐添加更远的异类邻居作为补充,直到已选择实例的拉格朗日乘子变得稳定。在数值实验中,我们在基准数据集和合成数据集上验证了我们提出的方法在准确性、执行时间和推理时间方面的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验