Suppr超能文献

机器学习中合成少数类过采样技术的挑战与局限

Challenges and limitations of synthetic minority oversampling techniques in machine learning.

作者信息

Alkhawaldeh Ibraheem M, Albalkhi Ibrahem, Naswhan Abdulqadir Jeprel

机构信息

Faculty of Medicine, Mutah University, Karak 61710, Jordan.

Department of Neuroradiology, Alfaisal University, Great Ormond Street Hospital NHS Foundation Trust, London WC1N 3JH, United Kingdom.

出版信息

World J Methodol. 2023 Dec 20;13(5):373-378. doi: 10.5662/wjm.v13.i5.373.

Abstract

Oversampling is the most utilized approach to deal with class-imbalanced datasets, as seen by the plethora of oversampling methods developed in the last two decades. We argue in the following editorial the issues with oversampling that stem from the possibility of overfitting and the generation of synthetic cases that might not accurately represent the minority class. These limitations should be considered when using oversampling techniques. We also propose several alternate strategies for dealing with imbalanced data, as well as a future work perspective.

摘要

过采样是处理类别不平衡数据集最常用的方法,过去二十年中大量过采样方法的出现就证明了这一点。在接下来的社论中,我们将讨论过采样存在的问题,这些问题源于过拟合的可能性以及生成的合成样本可能无法准确代表少数类。在使用过采样技术时应考虑这些局限性。我们还提出了几种处理不平衡数据的替代策略以及未来的工作展望。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/116c/10789107/ec4d64e85b49/WJM-13-373-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验