Singer Gonen, Marudi Matan
Faculty of Engineering, Bar-Ilan University, Ramat-Gan 52900, Israel.
Department of Industrial Engineering, Tel-Aviv University, Tel Aviv-Yafo 39040, Israel.
Entropy (Basel). 2020 Aug 7;22(8):871. doi: 10.3390/e22080871.
In this research, we develop ordinal decision-tree-based ensemble approaches in which an objective-based information gain measure is used to select the classifying attributes. We demonstrate the applicability of the approaches using AdaBoost and random forest algorithms for the task of classifying the regional daily growth factor of the spread of an epidemic based on a variety of explanatory factors. In such an application, some of the potential classification errors could have critical consequences. The classification tool will enable the spread of the epidemic to be tracked and controlled by yielding insights regarding the relationship between local containment measures and the daily growth factor. In order to benefit maximally from a variety of ordinal and non-ordinal algorithms, we also propose an ensemble majority voting approach to combine different algorithms into one model, thereby leveraging the strengths of each algorithm. We perform experiments in which the task is to classify the daily COVID-19 growth rate factor based on environmental factors and containment measures for 19 regions of Italy. We demonstrate that the ordinal algorithms outperform their non-ordinal counterparts with improvements in the range of 6-25% for a variety of common performance indices. The majority voting approach that combines ordinal and non-ordinal models yields a further improvement of between 3% and 10%.
在本研究中,我们开发了基于有序决策树的集成方法,其中使用基于目标的信息增益度量来选择分类属性。我们展示了这些方法在使用AdaBoost和随机森林算法根据各种解释因素对流行病传播的区域每日增长因子进行分类任务中的适用性。在这样的应用中,一些潜在的分类错误可能会产生严重后果。该分类工具将通过提供有关局部遏制措施与每日增长因子之间关系的见解,使流行病的传播得以跟踪和控制。为了从各种有序和无序算法中最大程度地受益,我们还提出了一种集成多数投票方法,将不同算法组合成一个模型,从而利用每种算法的优势。我们进行了实验,任务是根据意大利19个地区的环境因素和遏制措施对每日新冠病毒增长率因子进行分类。我们证明,对于各种常见性能指标,有序算法的表现优于无序算法,改进幅度在6%至25%之间。结合有序和无序模型的多数投票方法进一步提高了3%至10%。