Ramanujan Computing Centre, Anna University, Chennai 600025, Tamil Nadu, India.
Ramanujan Computing Centre, Anna University, Chennai 600025, Tamil Nadu, India.
Comput Methods Programs Biomed. 2015 Oct;121(3):137-48. doi: 10.1016/j.cmpb.2015.05.007. Epub 2015 Jun 6.
Rule-based classification is a typical data mining task that is being used in several medical diagnosis and decision support systems. The rules stored in the rule base have an impact on classification efficiency. Rule sets that are extracted with data mining tools and techniques are optimized using heuristic or meta-heuristic approaches in order to improve the quality of the rule base. In this work, a meta-heuristic approach called Wind-driven Swarm Optimization (WSO) is used. The uniqueness of this work lies in the biological inspiration that underlies the algorithm.
WSO uses Jval, a new metric, to evaluate the efficiency of a rule-based classifier. Rules are extracted from decision trees. WSO is used to obtain different permutations and combinations of rules whereby the optimal ruleset that satisfies the requirement of the developer is used for predicting the test data. The performance of various extensions of decision trees, namely, RIPPER, PART, FURIA and Decision Tables are analyzed. The efficiency of WSO is also compared with the traditional Particle Swarm Optimization.
Experiments were carried out with six benchmark medical datasets. The traditional C4.5 algorithm yields 62.89% accuracy with 43 rules for liver disorders dataset where as WSO yields 64.60% with 19 rules. For Heart disease dataset, C4.5 is 68.64% accurate with 98 rules where as WSO is 77.8% accurate with 34 rules. The normalized standard deviation for accuracy of PSO and WSO are 0.5921 and 0.5846 respectively.
WSO provides accurate and concise rulesets. PSO yields results similar to that of WSO but the novelty of WSO lies in its biological motivation and it is customization for rule base optimization. The trade-off between the prediction accuracy and the size of the rule base is optimized during the design and development of rule-based clinical decision support system. The efficiency of a decision support system relies on the content of the rule base and classification accuracy.
基于规则的分类是一种典型的数据挖掘任务,在多个医学诊断和决策支持系统中得到了应用。存储在规则库中的规则会影响分类效率。使用数据挖掘工具和技术提取的规则集,可以使用启发式或元启发式方法进行优化,以提高规则库的质量。在这项工作中,使用了一种称为风驱群优化(WSO)的元启发式方法。这项工作的独特之处在于算法背后的生物学启示。
WSO 使用 Jval 作为新的度量标准来评估基于规则的分类器的效率。从决策树中提取规则。WSO 用于获得不同的规则排列组合,以满足开发人员的要求,用于预测测试数据。分析了各种扩展的决策树,即 RIPPER、PART、FURIA 和决策表的性能。还将 WSO 的效率与传统的粒子群优化进行了比较。
在六个基准医学数据集上进行了实验。传统的 C4.5 算法在肝脏疾病数据集中生成 62.89%的准确率,有 43 条规则;而 WSO 在生成 64.60%的准确率时仅用了 19 条规则。在心脏疾病数据集中,C4.5 有 98 条规则的准确率为 68.64%,而 WSO 有 34 条规则的准确率为 77.8%。PSO 和 WSO 的准确率的标准化标准偏差分别为 0.5921 和 0.5846。
WSO 提供了准确而简洁的规则集。PSO 产生的结果与 WSO 相似,但 WSO 的新颖之处在于其生物学动机和对规则库优化的定制化。在基于规则的临床决策支持系统的设计和开发过程中,优化了预测准确性和规则库大小之间的权衡。决策支持系统的效率依赖于规则库的内容和分类准确性。