Suppr超能文献

多样性森林:利用分割采样在随机森林中实现创新的复杂分割程序

Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests.

作者信息

Hornung Roman

机构信息

Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377 Munich, Germany.

出版信息

SN Comput Sci. 2022;3(1):1. doi: 10.1007/s42979-021-00920-1. Epub 2021 Oct 21.

Abstract

UNLABELLED

The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for : (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s42979-021-00920-1.

摘要

未标注

多样性森林算法是一种替代的候选节点分裂采样方案,它使得随机森林中创新的复杂分裂过程成为可能。虽然传统的单变量二元分裂足以获得强大的预测性能,但新的复杂分裂过程有助于解决实际重要问题。例如,通过双变量分裂可以有效利用特征之间的相互作用。对于多样性森林,每次分裂都从以下方式采样的候选分裂集中选择:对于 :(1)采样一个分裂问题;(2)从(1)中采样的分裂问题中采样一个或几个分裂,并将这个或这些分裂添加到候选分裂集中。分裂问题是特定结构的分裂集合,取决于所考虑的各自分裂过程。这种采样方案使创新的复杂分裂过程在计算上切实可行,同时避免过拟合。使用单变量二元分裂对多样性森林算法的重要一般属性进行实证评估。基于220个具有二元结果的数据集,将多样性森林与传统随机森林以及使用极端随机树的随机森林进行比较。可以看出,多样性森林的分裂采样方案不会损害随机森林的预测性能,并且在指定的 值方面性能相当稳健。最近开发的交互森林是第一种使用复杂分裂过程的多样性森林方法。交互森林允许有效地对特征之间的相互作用进行建模和检测。作为展望,还讨论了进一步潜在的复杂分裂过程。

补充信息

在线版本包含可在10.1007/s42979-021-00920-1获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74be/8533673/e062bbcaeec2/42979_2021_920_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验