IEEE Trans Cybern. 2013 Dec;43(6):1734-46. doi: 10.1109/TSMCB.2012.2229269.
Structured continuous-label classification is a variety of classification in which the label is continuous in the data, but the goal is to classify data into classes that are a set of predefined ranges and can be organized in a hierarchy. In the hierarchy, the ranges at the lower levels are more specific and inherently more difficult to predict, whereas the ranges at the upper levels are less specific and inherently easier to predict. Therefore, both prediction specificity and prediction accuracy must be considered when building a decision tree (DT) from this kind of data. This paper proposes a novel classification algorithm for learning DT classifiers from data with structured continuous labels. This approach considers the distribution of labels throughout the hierarchical structure during the construction of trees without requiring discretization in the preprocessing stage. We compared the results of the proposed method with those of the C4.5 algorithm using eight real data sets. The empirical results indicate that the proposed method outperforms the C4.5 algorithm with regard to prediction accuracy, prediction specificity, and computational complexity.
结构化连续标签分类是一种分类方法,其中标签在数据中是连续的,但目标是将数据分类为一组预定义的范围,并可以按层次结构进行组织。在层次结构中,较低级别的范围更具体,预测难度更大,而较高级别的范围则不太具体,预测难度较小。因此,在从这种数据构建决策树(DT)时,必须同时考虑预测特异性和预测准确性。本文提出了一种从具有结构化连续标签的数据中学习 DT 分类器的新分类算法。该方法在构建树时考虑了标签在层次结构中的分布,而无需在预处理阶段进行离散化。我们使用八个真实数据集比较了所提出方法与 C4.5 算法的结果。实验结果表明,所提出的方法在预测准确性、预测特异性和计算复杂性方面优于 C4.5 算法。