• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于不平衡数据集分类的改进的基于重叠的欠采样及其在癫痫和帕金森病中的应用

Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease.

作者信息

Vuttipittayamongkol Pattaramon, Elyan Eyad

机构信息

School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, AB10 7GJ, UK.

出版信息

Int J Neural Syst. 2020 Aug;30(8):2050043. doi: 10.1142/S0129065720500434. Epub 2020 Jul 17.

DOI:10.1142/S0129065720500434
PMID:32674629
Abstract

Classification of imbalanced datasets has attracted substantial research interest over the past decades. Imbalanced datasets are common in several domains such as health, finance, security and others. A wide range of solutions to handle imbalanced datasets focus mainly on the class distribution problem and aim at providing more balanced datasets by means of resampling. However, existing literature shows that class overlap has a higher negative impact on the learning process than class distribution. In this paper, we propose overlap-based undersampling methods for maximizing the visibility of the minority class instances in the overlapping region. This is achieved by the use of soft clustering and the elimination threshold that is adaptable to the overlap degree to identify and eliminate negative instances in the overlapping region. For more accurate clustering and detection of overlapped negative instances, the presence of the minority class at the borderline areas is emphasized by means of oversampling. Extensive experiments using simulated and real-world datasets covering a wide range of imbalance and overlap scenarios including extreme cases were carried out. Results show significant improvement in sensitivity and competitive performance with well-established and state-of-the-art methods.

摘要

在过去几十年中,不平衡数据集的分类吸引了大量的研究兴趣。不平衡数据集在健康、金融、安全等多个领域都很常见。处理不平衡数据集的多种解决方案主要集中在类分布问题上,旨在通过重采样提供更平衡的数据集。然而,现有文献表明,类重叠对学习过程的负面影响比类分布更大。在本文中,我们提出了基于重叠的欠采样方法,以最大化重叠区域中少数类实例的可见性。这是通过使用软聚类和适应重叠程度的消除阈值来识别和消除重叠区域中的负实例来实现的。为了更准确地聚类和检测重叠的负实例,通过过采样强调少数类在边界区域的存在。我们使用了涵盖广泛不平衡和重叠场景(包括极端情况)的模拟和真实世界数据集进行了大量实验。结果表明,与成熟的和最新的方法相比,灵敏度有显著提高,性能具有竞争力。

相似文献

1
Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease.用于不平衡数据集分类的改进的基于重叠的欠采样及其在癫痫和帕金森病中的应用
Int J Neural Syst. 2020 Aug;30(8):2050043. doi: 10.1142/S0129065720500434. Epub 2020 Jul 17.
2
Response to Discussion on "Improved Overlap-Based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease,".对“基于重叠的改进欠采样在不平衡数据集分类中的应用及在癫痫和帕金森病中的应用”的讨论的回应。
Int J Neural Syst. 2020 Sep;30(9):2075002. doi: 10.1142/S0129065720750027. Epub 2020 Aug 12.
3
Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy.机器学习中不平衡数据集的重采样技术比较:在局灶性癫痫患者发作间期颅内脑电图记录的致痫区定位中的应用
Front Neuroinform. 2021 Nov 19;15:715421. doi: 10.3389/fninf.2021.715421. eCollection 2021.
4
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
5
Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy.用于不平衡数据集分类的进化欠采样:提议与分类法
Evol Comput. 2009 Fall;17(3):275-306. doi: 10.1162/evco.2009.17.3.275.
6
RSMOTE: improving classification performance over imbalanced medical datasets.RSMOTE:提升不平衡医学数据集的分类性能
Health Inf Sci Syst. 2020 Jun 12;8(1):22. doi: 10.1007/s13755-020-00112-w. eCollection 2020 Dec.
7
Discussion on Vuttipittayamongkol, P. and Elyan, E., Improved Overlap-Based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease.关于Vuttipittayamongkol, P.和Elyan, E.的讨论:用于不平衡数据集分类的改进重叠欠采样及其在癫痫和帕金森病中的应用
Int J Neural Syst. 2020 Sep;30(9):2075001. doi: 10.1142/S0129065720750015. Epub 2020 Jul 23.
8
The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets.使用赫林格距离欠采样模型改善不平衡医学数据集中疾病类别的分类
Appl Bionics Biomech. 2020 Nov 4;2020:8824625. doi: 10.1155/2020/8824625. eCollection 2020.
9
Optimal selection of resampling methods for imbalanced data with high complexity.高复杂度的不平衡数据中重采样方法的最优选择。
PLoS One. 2023 Jul 27;18(7):e0288540. doi: 10.1371/journal.pone.0288540. eCollection 2023.
10
Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis.基于聚类的欠采样与随机过采样示例和支持向量机在乳腺癌诊断中的不平衡分类。
Comput Assist Surg (Abingdon). 2019 Oct;24(sup2):62-72. doi: 10.1080/24699322.2019.1649074. Epub 2019 Aug 12.

引用本文的文献

1
Pruning-based oversampling technique with smoothed bootstrap resampling for imbalanced clinical dataset of Covid-19.基于剪枝的过采样技术与平滑自助重采样用于新冠疫情不平衡临床数据集
J King Saud Univ Comput Inf Sci. 2022 Oct;34(9):7830-7839. doi: 10.1016/j.jksuci.2021.09.021. Epub 2021 Sep 30.
2
Computer aided progression detection model based on optimized deep LSTM ensemble model and the fusion of multivariate time series data.基于优化深度 LSTM 集成模型和多元时间序列数据融合的计算机辅助进展检测模型。
Sci Rep. 2023 Sep 28;13(1):16336. doi: 10.1038/s41598-023-42796-6.
3
A novel early diagnostic framework for chronic diseases with class imbalance.
一种具有类别不平衡的慢性疾病新型早期诊断框架。
Sci Rep. 2022 May 21;12(1):8614. doi: 10.1038/s41598-022-12574-x.
4
An Improved Hybrid Approach for Handling Class Imbalance Problem.一种用于处理类别不平衡问题的改进混合方法。
Arab J Sci Eng. 2021;46(4):3853-3864. doi: 10.1007/s13369-021-05347-7. Epub 2021 Jan 28.