Hasanpour Hesam, Ghavamizadeh Meibodi Ramak, Navi Keivan
Department of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran.
PeerJ Comput Sci. 2019 Nov 18;5:e188. doi: 10.7717/peerj-cs.188. eCollection 2019.
Classification and associative rule mining are two substantial areas in data mining. Some scientists attempt to integrate these two field called rule-based classifiers. Rule-based classifiers can play a very important role in applications such as fraud detection, medical diagnosis, etc. Numerous previous studies have shown that this type of classifier achieves a higher classification accuracy than traditional classification algorithms. However, they still suffer from a fundamental limitation. Many rule-based classifiers used various greedy techniques to prune the redundant rules that lead to missing some important rules. Another challenge that must be considered is related to the enormous set of mined rules that result in high processing overhead. The result of these approaches is that the final selected rules may not be the global best rules. These algorithms are not successful at exploiting search space effectively in order to select the best subset of candidate rules. We merged the Apriori algorithm, Harmony Search, and classification-based association rules (CBA) algorithm in order to build a rule-based classifier. We applied a modified version of the Apriori algorithm with multiple minimum support for extracting useful rules for each class in the dataset. Instead of using a large number of candidate rules, binary Harmony Search was utilized for selecting the best subset of rules that appropriate for building a classification model. We applied the proposed method on a seventeen benchmark dataset and compared its result with traditional association rule classification algorithms. The statistical results show that our proposed method outperformed other rule-based approaches.
分类和关联规则挖掘是数据挖掘中的两个重要领域。一些科学家试图将这两个领域整合起来,称为基于规则的分类器。基于规则的分类器在欺诈检测、医学诊断等应用中可以发挥非常重要的作用。许多先前的研究表明,这种类型的分类器比传统分类算法具有更高的分类准确率。然而,它们仍然存在一个基本限制。许多基于规则的分类器使用各种贪婪技术来修剪冗余规则,这导致遗漏一些重要规则。另一个必须考虑的挑战与大量挖掘出的规则有关,这会导致高处理开销。这些方法的结果是,最终选择的规则可能不是全局最优规则。这些算法在有效利用搜索空间以选择候选规则的最佳子集方面并不成功。我们将Apriori算法、和声搜索算法和基于分类的关联规则(CBA)算法合并,以构建一个基于规则的分类器。我们应用了具有多个最小支持度的Apriori算法的修改版本,以便为数据集中的每个类提取有用规则。我们没有使用大量候选规则,而是利用二进制和声搜索来选择适合构建分类模型的最佳规则子集。我们将所提出的方法应用于17个基准数据集,并将其结果与传统关联规则分类算法进行比较。统计结果表明,我们提出的方法优于其他基于规则的方法。