Sun Jianyuan, Yu Hui, Zhong Guoqiang, Dong Junyu, Zhang Shu, Yu Hongchuan
IEEE Trans Cybern. 2022 Jan;52(1):205-214. doi: 10.1109/TCYB.2020.2972956. Epub 2022 Jan 11.
The original random forests (RFs) algorithm has been widely used and has achieved excellent performance for the classification and regression tasks. However, the research on the theory of RFs lags far behind its applications. In this article, to narrow the gap between the applications and the theory of RFs, we propose a new RFs algorithm, called random Shapley forests (RSFs), based on the Shapley value. The Shapley value is one of the well-known solutions in the cooperative game, which can fairly assess the power of each player in a game. In the construction of RSFs, RSFs use the Shapley value to evaluate the importance of each feature at each tree node by computing the dependency among the possible feature coalitions. In particular, inspired by the existing consistency theory, we have proved the consistency of the proposed RFs algorithm. Moreover, to verify the effectiveness of the proposed algorithm, experiments on eight UCI benchmark datasets and four real-world datasets have been conducted. The results show that RSFs perform better than or at least comparable with the existing consistent RFs, the original RFs, and a classic classifier, support vector machines.
原始的随机森林(RFs)算法已被广泛使用,并在分类和回归任务中取得了优异的性能。然而,对随机森林理论的研究远远落后于其应用。在本文中,为了缩小随机森林应用与理论之间的差距,我们基于夏普利值提出了一种新的随机森林算法,称为随机夏普利森林(RSFs)。夏普利值是合作博弈中著名的解之一,它可以公平地评估博弈中每个参与者的影响力。在随机夏普利森林的构建中,随机夏普利森林通过计算可能的特征联盟之间的依赖性,使用夏普利值来评估每个树节点上每个特征的重要性。特别地,受现有一致性理论的启发,我们证明了所提出的随机森林算法的一致性。此外,为了验证所提算法的有效性,我们在八个UCI基准数据集和四个真实世界数据集上进行了实验。结果表明,随机夏普利森林的性能优于或至少可与现有的一致性随机森林、原始随机森林以及经典分类器支持向量机相媲美。