基于约束增益和深度最优的决策树属性选择

Attribute Selection Based on Constraint Gain and Depth Optimal for a Decision Tree.

作者信息

Sun Huaining, Hu Xuegang, Zhang Yuhong

机构信息

School of Computer Science, Huainan Normal University, Huainan 232038, China.

School of Computer and Information, Hefei University and Technology, Hefei 230009, China.

出版信息

Entropy (Basel). 2019 Feb 19;21(2):198. doi: 10.3390/e21020198.

DOI:10.3390/e21020198

PMID:33266913

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7514679/

Abstract

Uncertainty evaluation based on statistical probabilistic information entropy is a commonly used mechanism for a heuristic method construction of decision tree learning. The entropy kernel potentially links its deviation and decision tree classification performance. This paper presents a decision tree learning algorithm based on constrained gain and depth induction optimization. Firstly, the calculation and analysis of single- and multi-value event uncertainty distributions of information entropy is followed by an enhanced property of single-value event entropy kernel and multi-value event entropy peaks as well as a reciprocal relationship between peak location and the number of possible events. Secondly, this study proposed an estimated method for information entropy whose entropy kernel is replaced with a peak-shift sine function to establish a decision tree learning (CGDT) algorithm on the basis of constraint gain. Finally, by combining branch convergence and fan-out indices under an inductive depth of a decision tree, we built a constraint gained and depth inductive improved decision tree (CGDIDT) learning algorithm. Results show the benefits of the CGDT and CGDIDT algorithms.

摘要

基于统计概率信息熵的不确定性评估是一种常用于启发式决策树学习方法构建的机制。熵核潜在地关联着其偏差与决策树分类性能。本文提出了一种基于约束增益和深度归纳优化的决策树学习算法。首先，对信息熵的单值和多值事件不确定性分布进行计算与分析，随后得出单值事件熵核和多值事件熵峰的增强特性以及峰位置与可能事件数量之间的倒数关系。其次，本研究提出了一种信息熵估计方法，该方法用峰移正弦函数替换熵核，以在约束增益的基础上建立决策树学习（CGDT）算法。最后，通过结合决策树归纳深度下的分支收敛和扇出指数，构建了一种约束增益和深度归纳改进的决策树（CGDIDT）学习算法。结果显示了CGDT和CGDIDT算法的优势。