• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过使用SVM+映射到更高维度来改进从复杂的多类不平衡和重叠数据中学习。

Improving learning from the complex multi-class imbalanced and overlapped data by mapping into higher dimension using SVM+.

作者信息

Mahmood Zafar, Jamel Leila, Salem Dina Ahmed, Ashraf Imran

机构信息

Department of Computer Science, University of Gujrat, Gujrat, Pakistan.

Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia.

出版信息

Sci Rep. 2025 Aug 25;15(1):31245. doi: 10.1038/s41598-025-13929-w.

DOI:10.1038/s41598-025-13929-w
PMID:40854927
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12378458/
Abstract

Several issues are there to prevent the traditional classifiers from getting an acceptable performance level while learning from multi-class problems. One of the main problems is the unequal distribution of samples, which significantly reduces the efficiency of the underlying classifier when combined with incompatible optimization benchmarks and data overlapping phenomena. The classifier performance is compromised beyond the expected level by the combined effects of imbalanced distribution and sample overlapping around the class boundaries. This problem worsens with the increase in the number of classes in the multi-class scenario. Despite having a more significant combined effect on classifier performance, the combined effects of imbalanced data and overlapping questions have been given the least attention in the research. To improve models' learning from imbalanced multi-class and overlapping of shared attributes issues, this work introduces SVM++, a modified version of support vector machines (SVM). Comprising of three steps, Algorithm-1 finds and splits the training set into overlapping and non-overlapping samples. Algorithm-2 then separates the overlapped data into the Critical-1 and Critical-2 regions. The Critical-1 region consists of overlapped samples, sharing similar characteristics, which is the main cause of degraded classification performance. In the third step, an algorithm based on the mean of the maximum and minimum distance of the Critical-1 region samples is proposed by improving the traditional SVM kernel mapping function to a higher dimension. Thirty real datasets with various imbalances and degrees of overlap are utilized to compare our suggested algorithms' supremacy with the state-of-the-art classifiers.

摘要

在从多类问题中学习时,存在几个问题会阻碍传统分类器达到可接受的性能水平。主要问题之一是样本分布不均衡,当与不兼容的优化基准和数据重叠现象结合时,这会显著降低底层分类器的效率。不平衡分布和类边界周围的样本重叠的综合影响,使分类器性能受损程度超出预期水平。在多类场景中,随着类数量的增加,这个问题会恶化。尽管不平衡数据和重叠问题对分类器性能的综合影响更为显著,但在研究中却最少受到关注。为了改善模型从不平衡多类和共享属性重叠问题中的学习效果,这项工作引入了支持向量机的改进版本SVM++。算法1由三个步骤组成,它找到并将训练集分为重叠样本和非重叠样本。然后算法2将重叠数据分离到关键1区和关键2区。关键1区由具有相似特征的重叠样本组成,这是分类性能下降的主要原因。在第三步中,通过将传统SVM核映射函数改进到更高维度,提出了一种基于关键1区样本最大和最小距离均值的算法。利用30个具有各种不平衡和重叠程度的真实数据集,将我们提出的算法的优越性与最先进的分类器进行比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/ccf884214c59/41598_2025_13929_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/41b466491478/41598_2025_13929_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/f35fb0f4da6b/41598_2025_13929_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/e40c9c0121de/41598_2025_13929_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/697629fae0a6/41598_2025_13929_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/6a170850843f/41598_2025_13929_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/81e592807953/41598_2025_13929_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/4ec86b625b30/41598_2025_13929_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/a4fd99045f43/41598_2025_13929_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/1d7fb7e66705/41598_2025_13929_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/07f8100a4528/41598_2025_13929_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/d9f8d00f6fd8/41598_2025_13929_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/ccf884214c59/41598_2025_13929_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/41b466491478/41598_2025_13929_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/f35fb0f4da6b/41598_2025_13929_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/e40c9c0121de/41598_2025_13929_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/697629fae0a6/41598_2025_13929_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/6a170850843f/41598_2025_13929_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/81e592807953/41598_2025_13929_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/4ec86b625b30/41598_2025_13929_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/a4fd99045f43/41598_2025_13929_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/1d7fb7e66705/41598_2025_13929_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/07f8100a4528/41598_2025_13929_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/d9f8d00f6fd8/41598_2025_13929_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/12378458/ccf884214c59/41598_2025_13929_Fig9_HTML.jpg

相似文献

1
Improving learning from the complex multi-class imbalanced and overlapped data by mapping into higher dimension using SVM+.通过使用SVM+映射到更高维度来改进从复杂的多类不平衡和重叠数据中学习。
Sci Rep. 2025 Aug 25;15(1):31245. doi: 10.1038/s41598-025-13929-w.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Classification of finger movements through optimal EEG channel and feature selection.通过最优脑电图通道和特征选择对手指运动进行分类。
Front Hum Neurosci. 2025 Jul 16;19:1633910. doi: 10.3389/fnhum.2025.1633910. eCollection 2025.
4
Sexual Harassment and Prevention Training性骚扰与预防培训
5
Short-Term Memory Impairment短期记忆障碍
6
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验:对定性文献的系统综述
JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.
7
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
8
Radiomics-Based Model Using Tumor and Peritumoral Features with Semi-Supervised and Privileged Learning for Metastatic Risk Prediction in Lung Cancer: A Multi-Site Study.基于影像组学的模型:利用肿瘤及瘤周特征结合半监督和特权学习预测肺癌转移风险的多中心研究
Comput Methods Programs Biomed. 2025 Aug 20;271:109029. doi: 10.1016/j.cmpb.2025.109029.
9
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
10
Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.关于使用人工智能评估临床数据完整性并生成元数据的提案:算法开发与验证
JMIR Med Inform. 2025 Jun 30;13:e60204. doi: 10.2196/60204.

本文引用的文献

1
Locating a disinfection facility for hazardous healthcare waste in the COVID-19 era: a novel approach based on Fermatean fuzzy ITARA-MARCOS and random forest recursive feature elimination algorithm.在新冠疫情时代寻找危险医疗废物的消毒设施:一种基于费马模糊ITARA-MARCOS和随机森林递归特征消除算法的新方法。
Ann Oper Res. 2022 Jul 8:1-46. doi: 10.1007/s10479-022-04822-0.