Jiménez Fernando, Martínez Carlos, Miralles-Pechuán Luis, Sánchez Gracia, Sciavicco Guido
Department of Information and Communication Engineering, University of Murcia, 30071 Murcia, Spain.
Centre for Applied Data Analytics Research (CeADAR), University College Dublin, D04 Dublin 4, Ireland.
Entropy (Basel). 2018 Sep 7;20(9):684. doi: 10.3390/e20090684.
The ease of interpretation of a classification model is essential for the task of validating it. Sometimes it is required to clearly explain the classification process of a model's predictions. Models which are inherently easier to interpret can be effortlessly related to the context of the problem, and their predictions can be, if necessary, ethically and legally evaluated. In this paper, we propose a novel method to generate rule-based classifiers from categorical data that can be readily interpreted. Classifiers are generated using a multi-objective optimization approach focusing on two main objectives: maximizing the performance of the learned classifier and minimizing its number of rules. The multi-objective evolutionary algorithms and have been adapted to optimize the performance of the classifier based on three different machine learning metrics: accuracy, area under the curve, and root mean square error. We have extensively compared the generated classifiers using our proposed method with classifiers generated using classical methods such as , , and . The experiments have been conducted in full training mode, in 10-fold cross-validation mode, and in train/test splitting mode. To make results reproducible, we have used the well-known and publicly available datasets , , , , and . After performing an exhaustive statistical test on our results, we conclude that the proposed method is able to generate highly accurate and easy to interpret classification models.
分类模型的易解释性对于其验证任务至关重要。有时需要清晰地解释模型预测的分类过程。本质上更易于解释的模型能够轻松地与问题背景相关联,并且如有必要,其预测可以在伦理和法律层面进行评估。在本文中,我们提出了一种从类别数据生成易于解释的基于规则的分类器的新方法。使用多目标优化方法生成分类器,该方法侧重于两个主要目标:最大化学习到的分类器的性能并最小化其规则数量。多目标进化算法 和 已被调整以基于三种不同的机器学习指标优化分类器的性能:准确率、曲线下面积和均方根误差。我们使用我们提出的方法生成的分类器与使用诸如 、 、 和 等经典方法生成的分类器进行了广泛比较。实验在全训练模式、10折交叉验证模式和训练/测试分割模式下进行。为了使结果可重现,我们使用了著名的公开可用数据集 、 、 、 、 和 。在对我们的结果进行详尽的统计测试后,我们得出结论,所提出的方法能够生成高度准确且易于解释的分类模型。