• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于密度的边界识别的 SVM 训练数据约简。

Data reduction for SVM training using density-based border identification.

机构信息

Department of Computer Science, College of Computing and Information Technology, Arab Academy of Science, Technology & Maritime Transport, Alexandria, Egypt.

Mechatronics Engineering Department, Faculty of Engineering, Horus University Egypt, New Damietta, Egypt.

出版信息

PLoS One. 2024 Apr 3;19(4):e0300641. doi: 10.1371/journal.pone.0300641. eCollection 2024.

DOI:10.1371/journal.pone.0300641
PMID:38568906
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10990207/
Abstract

Numerous classification and regression problems have extensively used Support Vector Machines (SVMs). However, the SVM approach is less practical for large datasets because of its processing cost. This is primarily due to the requirement of optimizing a quadratic programming problem to determine the decision boundary during training. As a result, methods for selecting data instances that have a better likelihood of being chosen as support vectors by the SVM algorithm have been developed to help minimize the bulk of training data. This paper presents a density-based method, called Density-based Border Identification (DBI), in addition to four different variations of the method, for the lessening of the SVM training data through the extraction of a layer of border instances. For higher-dimensional datasets, the extraction is performed on lower-dimensional embeddings obtained by Uniform Manifold Approximation and Projection (UMAP), and the resulting subset can be repetitively used for SVM training in higher dimensions. Experimental findings on different datasets, such as Banana, USPS, and Adult9a, have shown that the best-performing variations of the proposed method effectively reduced the size of the training data and achieved acceptable training and prediction speedups while maintaining an adequate classification accuracy compared to training on the original dataset. These results, as well as comparisons to a selection of related state-of-the-art methods from the literature, such as Border Point extraction based on Locality-Sensitive Hashing (BPLSH), Clustering-Based Convex Hull (CBCH), and Shell Extraction (SE), suggest that our proposed methods are effective and potentially useful.

摘要

许多分类和回归问题都广泛使用了支持向量机(SVM)。然而,由于其处理成本,SVM 方法对于大型数据集来说不太实用。这主要是因为在训练过程中需要优化二次规划问题来确定决策边界。因此,已经开发了选择数据实例的方法,这些实例更有可能被 SVM 算法选为支持向量,以帮助最小化训练数据的大部分。本文提出了一种基于密度的方法,称为基于密度的边界识别(DBI),以及该方法的四种不同变体,用于通过提取边界实例层来减少 SVM 训练数据。对于高维数据集,提取是在通过均匀流形逼近和投影(UMAP)获得的低维嵌入上进行的,并且可以在更高维度上重复使用提取的子集进行 SVM 训练。在不同数据集(如 Banana、USPS 和 Adult9a)上的实验结果表明,所提出方法的性能最佳变体有效地减小了训练数据的大小,并实现了可接受的训练和预测加速,同时与在原始数据集上进行训练相比保持了足够的分类准确性。这些结果以及与文献中选择的一些相关最先进方法的比较,例如基于局部敏感哈希的边界点提取(BPLSH)、基于聚类的凸壳(CBCH)和外壳提取(SE),表明我们提出的方法是有效且可能有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/93c9f87157eb/pone.0300641.g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/807db34fb947/pone.0300641.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/1374dae06dbb/pone.0300641.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/fbe61d6ddf6d/pone.0300641.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/8f9df1fd2640/pone.0300641.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/ea566c0ac845/pone.0300641.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/9b2fa3c3f87b/pone.0300641.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/37b074aac9eb/pone.0300641.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/ac1aabf2cabd/pone.0300641.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/e9f191e1357f/pone.0300641.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/9d4740f735be/pone.0300641.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/26f9f25102f6/pone.0300641.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/ec15d3244ccb/pone.0300641.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/c77826c06bbd/pone.0300641.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/602831b86d73/pone.0300641.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/94cf372f0a35/pone.0300641.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/3f83a7af5f11/pone.0300641.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/d1c0d1df2f16/pone.0300641.g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/04139a61725c/pone.0300641.g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/114d6e506758/pone.0300641.g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/93c9f87157eb/pone.0300641.g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/807db34fb947/pone.0300641.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/1374dae06dbb/pone.0300641.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/fbe61d6ddf6d/pone.0300641.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/8f9df1fd2640/pone.0300641.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/ea566c0ac845/pone.0300641.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/9b2fa3c3f87b/pone.0300641.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/37b074aac9eb/pone.0300641.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/ac1aabf2cabd/pone.0300641.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/e9f191e1357f/pone.0300641.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/9d4740f735be/pone.0300641.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/26f9f25102f6/pone.0300641.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/ec15d3244ccb/pone.0300641.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/c77826c06bbd/pone.0300641.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/602831b86d73/pone.0300641.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/94cf372f0a35/pone.0300641.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/3f83a7af5f11/pone.0300641.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/d1c0d1df2f16/pone.0300641.g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/04139a61725c/pone.0300641.g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/114d6e506758/pone.0300641.g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f78/10990207/93c9f87157eb/pone.0300641.g020.jpg

相似文献

1
Data reduction for SVM training using density-based border identification.基于密度的边界识别的 SVM 训练数据约简。
PLoS One. 2024 Apr 3;19(4):e0300641. doi: 10.1371/journal.pone.0300641. eCollection 2024.
2
Vicinal support vector classifier using supervised kernel-based clustering.基于监督核聚类的邻接支持向量分类器。
Artif Intell Med. 2014 Mar;60(3):189-96. doi: 10.1016/j.artmed.2014.01.003. Epub 2014 Feb 7.
3
Differentially Private Singular Value Decomposition for Training Support Vector Machines.差分隐私奇异值分解在支持向量机训练中的应用。
Comput Intell Neurosci. 2022 Mar 26;2022:2935975. doi: 10.1155/2022/2935975. eCollection 2022.
4
On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data.基于 QDE-SVM 的 scRNA-seq 数据基因特征选择和细胞类型分类方法。
PLoS One. 2023 Oct 19;18(10):e0292961. doi: 10.1371/journal.pone.0292961. eCollection 2023.
5
Binary classification SVM-based algorithms with interval-valued training data using triangular and Epanechnikov kernels.基于支持向量机的二分类算法,使用三角核和埃帕涅尼科夫核处理区间值训练数据。
Neural Netw. 2016 Aug;80:53-66. doi: 10.1016/j.neunet.2016.04.005. Epub 2016 Apr 27.
6
Global-local least-squares support vector machine (GLocal-LS-SVM).全局-局部最小二乘支持向量机(GLocal-LS-SVM)。
PLoS One. 2023 Apr 27;18(4):e0285131. doi: 10.1371/journal.pone.0285131. eCollection 2023.
7
Classification of Benign and Malignant Breast Masses on Mammograms for Large Datasets using Core Vector Machines.基于核向量机的大样本乳腺钼靶图像良恶性肿块分类
Curr Med Imaging. 2020;16(6):703-710. doi: 10.2174/1573405615666190801121506.
8
Data classification with radial basis function networks based on a novel kernel density estimation algorithm.基于一种新型核密度估计算法的径向基函数网络数据分类
IEEE Trans Neural Netw. 2005 Jan;16(1):225-36. doi: 10.1109/TNN.2004.836229.
9
Improved Support Vector Machine Enabled Radial Basis Function and Linear Variants for Remote Sensing Image Classification.改进的支持向量机支持的径向基函数和线性变体在遥感图像分类中的应用。
Sensors (Basel). 2021 Jun 28;21(13):4431. doi: 10.3390/s21134431.
10
Efficient Selection of Gaussian Kernel SVM Parameters for Imbalanced Data.基于不平衡数据的高斯核 SVM 参数的有效选择。
Genes (Basel). 2023 Feb 25;14(3):583. doi: 10.3390/genes14030583.

引用本文的文献

1
A severity classification model of cervical spondylotic radiculopathy symptoms based on MRI radiomics: A retrospective study.基于MRI影像组学的神经根型颈椎病症状严重程度分类模型:一项回顾性研究。
PLoS One. 2025 Jul 9;20(7):e0327756. doi: 10.1371/journal.pone.0327756. eCollection 2025.

本文引用的文献

1
Initialization is critical for preserving global data structure in both t-SNE and UMAP.初始化对于在t-SNE和UMAP中保存全局数据结构至关重要。
Nat Biotechnol. 2021 Feb;39(2):156-157. doi: 10.1038/s41587-020-00809-z. Epub 2021 Feb 1.