1 Department of Computer Science and Artificial Intelligence, University of Granada, Granada 18071, Spain.
2 Department of Civil Engineering, University of Burgos, Burgos 09006, Spain.
Int J Neural Syst. 2017 Sep;27(6):1750028. doi: 10.1142/S0129065717500289. Epub 2017 Apr 11.
Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
不平衡分类与那些在类之间分布不均匀的问题有关。除了前者,当实例位于重叠区域时,问题的正确建模变得更加困难。目前针对这两个问题的解决方案通常集中在二进制案例研究上,因为多类数据集需要额外的努力来解决。在这项研究中,我们通过特征和实例选择的组合来克服这些问题。特征选择将简化重叠区域,从而简化生成规则以区分不同的类别。从所有类别中选择实例可以通过为学习任务找到最合适的类别分布来解决不平衡问题,同时可能还可以去除噪声和困难的边界示例。为了获得最优的特征和实例组合,我们将这两个参数的搜索嵌入到多目标进化算法中,在这个包装器方法中,使用 C4.5 决策树作为基准分类器。多目标方案具有双重优势:搜索空间更加广阔,我们可以提供一组不同的解决方案,以构建分类器集合。该提案与不平衡分类的几种最先进的解决方案进行了对比,在二进制和多类问题上都取得了优异的结果。