Academic Unit of Radiology, Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom.
Diabetes Research Department, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom.
PLoS One. 2020 Dec 15;15(12):e0243907. doi: 10.1371/journal.pone.0243907. eCollection 2020.
One of the fundamental challenges when dealing with medical imaging datasets is class imbalance. Class imbalance happens where an instance in the class of interest is relatively low, when compared to the rest of the data. This study aims to apply oversampling strategies in an attempt to balance the classes and improve classification performance. We evaluated four different classifiers from k-nearest neighbors (k-NN), support vector machine (SVM), multilayer perceptron (MLP) and decision trees (DT) with 73 oversampling strategies. In this work, we used imbalanced learning oversampling techniques to improve classification in datasets that are distinctively sparser and clustered. This work reports the best oversampling and classifier combinations and concludes that the usage of oversampling methods always outperforms no oversampling strategies hence improving the classification results.
处理医学影像数据集时,面临的一个基本挑战是类别不平衡。类别不平衡是指与其他数据相比,感兴趣的类别中的实例相对较少。本研究旨在应用过采样策略来平衡类别并提高分类性能。我们评估了来自 k-近邻(k-NN)、支持向量机(SVM)、多层感知机(MLP)和决策树(DT)的四种不同分类器,它们结合了 73 种过采样策略。在这项工作中,我们使用不平衡学习过采样技术来改善数据集的分类,这些数据集的特征是明显稀疏和聚类。本工作报告了最佳的过采样和分类器组合,并得出结论,使用过采样方法始终优于无过采样策略,从而提高了分类结果。