在特征空间中对少数类进行过采样。

Oversampling the Minority Class in the Feature Space.

出版信息

IEEE Trans Neural Netw Learn Syst. 2016 Sep;27(9):1947-61. doi: 10.1109/TNNLS.2015.2461436. Epub 2015 Aug 25.

DOI:10.1109/TNNLS.2015.2461436

Abstract

The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach oversamples the minority class through convex combination of its patterns. We explore the general idea of synthetic oversampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (EFS) (a Euclidean space isomorphic to the feature space) for oversampling purposes. The proposed method is framed in the context of support vector machines, where the imbalanced data sets can pose a serious hindrance. The idea is investigated in three scenarios: 1) oversampling in the full and reduced-rank EFSs; 2) a kernel learning technique maximizing the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); and 3) a unified framework for preferential oversampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over 50 imbalanced data sets.

摘要

一些现实世界数据的不平衡性质是机器学习研究人员目前面临的挑战之一。一种常见的方法是通过对少数类模式的凸组合来对其进行过采样。我们在核函数（而不是输入空间）诱导的特征空间中探索了综合过采样的一般思想。如果核函数与潜在问题匹配，则类将是线性可分的，并且合成生成的模式将位于少数类区域。由于特征空间无法直接访问，因此我们使用经验特征空间（EFS）（与特征空间同构的欧几里得空间）进行过采样。所提出的方法是在支持向量机的上下文中提出的，其中不平衡数据集可能会造成严重的阻碍。该想法在三种情况下进行了研究：1）在全秩和降秩 EFS 中进行过采样；2）一种最大化数据类分离的核学习技术，以研究特征空间结构（由核函数隐式定义）的影响；3）一种统一的优先过采样框架，涵盖了文献中的一些先前方法。我们通过在 50 多个不平衡数据集上进行广泛的实验来支持我们的研究。

相似文献

Oversampling the Minority Class in the Feature Space.在特征空间中对少数类进行过采样。

IEEE Trans Neural Netw Learn Syst. 2016 Sep;27(9):1947-61. doi: 10.1109/TNNLS.2015.2461436. Epub 2015 Aug 25.

Classification of Imbalanced Data by Oversampling in Kernel Space of Support Vector Machines.支持向量机核空间中基于过采样的不平衡数据分类

IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):4065-4076. doi: 10.1109/TNNLS.2017.2751612. Epub 2017 Oct 10.

Immune centroids oversampling method for binary classification.用于二分类的免疫质心过采样方法。

Comput Intell Neurosci. 2015;2015:109806. doi: 10.1155/2015/109806. Epub 2015 Mar 5.

Deep Learning-Based Imbalanced Classification With Fuzzy Support Vector Machine.基于深度学习和模糊支持向量机的不平衡分类

Front Bioeng Biotechnol. 2022 Jan 21;9:802712. doi: 10.3389/fbioe.2021.802712. eCollection 2021.

Selective oversampling approach for strongly imbalanced data.针对严重不平衡数据的选择性过采样方法。

PeerJ Comput Sci. 2021 Jun 18;7:e604. doi: 10.7717/peerj-cs.604. eCollection 2021.

Affinity and class probability-based fuzzy support vector machine for imbalanced data sets.基于亲和力和类概率的模糊支持向量机在不平衡数据集上的应用。

Neural Netw. 2020 Feb;122:289-307. doi: 10.1016/j.neunet.2019.10.016. Epub 2019 Nov 2.

Optimizing the kernel in the empirical feature space.在经验特征空间中优化核函数。

IEEE Trans Neural Netw. 2005 Mar;16(2):460-74. doi: 10.1109/TNN.2004.841784.

RACOG and wRACOG: Two Probabilistic Oversampling Techniques.RACOG和wRACOG：两种概率性过采样技术。

IEEE Trans Knowl Data Eng. 2015 Jan 1;27(1):222-234. doi: 10.1109/TKDE.2014.2324567. Epub 2014 May 16.

A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification.一种用于不平衡和多峰时间序列分类的简约高斯树混合抽样模型。

IEEE Trans Neural Netw Learn Syst. 2014 Dec;25(12):2226-39. doi: 10.1109/TNNLS.2014.2308321.

Reduced multiple empirical kernel learning machine.简化的多重经验核学习机

Cogn Neurodyn. 2015 Feb;9(1):63-73. doi: 10.1007/s11571-014-9304-2. Epub 2014 Jul 29.

引用本文的文献

Assessment of non-fatal injuries among university students in Hainan: a machine learning approach to exploring key factors.海南大学生非致命伤评估：一种探索关键因素的机器学习方法

Front Public Health. 2024 Nov 21;12:1453650. doi: 10.3389/fpubh.2024.1453650. eCollection 2024.

Deep learning and radiomic feature-based blending ensemble classifier for malignancy risk prediction in cystic renal lesions.基于深度学习和影像组学特征的融合集成分类器用于囊性肾病变恶性风险预测

Insights Imaging. 2023 Jan 11;14(1):6. doi: 10.1186/s13244-022-01349-7.

Stratification of malignant renal neoplasms from cystic renal lesions using deep learning and radiomics features based on a stacking ensemble CT machine learning algorithm.基于堆叠集成CT机器学习算法，利用深度学习和影像组学特征对囊性肾病变中的恶性肾肿瘤进行分层。

Front Oncol. 2022 Oct 25;12:1028577. doi: 10.3389/fonc.2022.1028577. eCollection 2022.

Classifying Cognitive Profiles Using Machine Learning with Privileged Information in Mild Cognitive Impairment.在轻度认知障碍中使用带特权信息的机器学习对认知特征进行分类。

Front Comput Neurosci. 2016 Nov 17;10:117. doi: 10.3389/fncom.2016.00117. eCollection 2016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在特征空间中对少数类进行过采样。

Oversampling the Minority Class in the Feature Space.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献