• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多样性森林:利用分割采样在随机森林中实现创新的复杂分割程序

Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests.

作者信息

Hornung Roman

机构信息

Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377 Munich, Germany.

出版信息

SN Comput Sci. 2022;3(1):1. doi: 10.1007/s42979-021-00920-1. Epub 2021 Oct 21.

DOI:10.1007/s42979-021-00920-1
PMID:34723205
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8533673/
Abstract

UNLABELLED

The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for : (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s42979-021-00920-1.

摘要

未标注

多样性森林算法是一种替代的候选节点分裂采样方案,它使得随机森林中创新的复杂分裂过程成为可能。虽然传统的单变量二元分裂足以获得强大的预测性能,但新的复杂分裂过程有助于解决实际重要问题。例如,通过双变量分裂可以有效利用特征之间的相互作用。对于多样性森林,每次分裂都从以下方式采样的候选分裂集中选择:对于 :(1)采样一个分裂问题;(2)从(1)中采样的分裂问题中采样一个或几个分裂,并将这个或这些分裂添加到候选分裂集中。分裂问题是特定结构的分裂集合,取决于所考虑的各自分裂过程。这种采样方案使创新的复杂分裂过程在计算上切实可行,同时避免过拟合。使用单变量二元分裂对多样性森林算法的重要一般属性进行实证评估。基于220个具有二元结果的数据集,将多样性森林与传统随机森林以及使用极端随机树的随机森林进行比较。可以看出,多样性森林的分裂采样方案不会损害随机森林的预测性能,并且在指定的 值方面性能相当稳健。最近开发的交互森林是第一种使用复杂分裂过程的多样性森林方法。交互森林允许有效地对特征之间的相互作用进行建模和检测。作为展望,还讨论了进一步潜在的复杂分裂过程。

补充信息

在线版本包含可在10.1007/s42979-021-00920-1获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74be/8533673/2143d0faf20c/42979_2021_920_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74be/8533673/e062bbcaeec2/42979_2021_920_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74be/8533673/cee90e705759/42979_2021_920_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74be/8533673/2143d0faf20c/42979_2021_920_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74be/8533673/e062bbcaeec2/42979_2021_920_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74be/8533673/cee90e705759/42979_2021_920_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74be/8533673/2143d0faf20c/42979_2021_920_Fig3_HTML.jpg

相似文献

1
Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests.多样性森林:利用分割采样在随机森林中实现创新的复杂分割程序
SN Comput Sci. 2022;3(1):1. doi: 10.1007/s42979-021-00920-1. Epub 2021 Oct 21.
2
Decision tree modeling using R.使用 R 进行决策树建模。
Ann Transl Med. 2016 Aug;4(15):275. doi: 10.21037/atm.2016.05.14.
3
Oblique and rotation double random forest.倾斜和旋转双重随机森林。
Neural Netw. 2022 Sep;153:496-517. doi: 10.1016/j.neunet.2022.06.012. Epub 2022 Jun 18.
4
Unbiased split variable selection for random survival forests using maximally selected rank statistics.使用最大选择秩统计量对随机生存森林进行无偏分裂变量选择。
Stat Med. 2017 Apr 15;36(8):1272-1284. doi: 10.1002/sim.7212. Epub 2017 Jan 15.
5
A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data.基于模拟研究以及对两个事件发生时间数据应用的情况,对条件推断生存森林模型与随机生存森林进行比较。
BMC Med Res Methodol. 2017 Jul 28;17(1):115. doi: 10.1186/s12874-017-0383-8.
6
Splitting on categorical predictors in random forests.随机森林中对分类预测变量进行划分。
PeerJ. 2019 Feb 7;7:e6339. doi: 10.7717/peerj.6339. eCollection 2019.
7
Variable importance-weighted Random Forests.可变重要性加权随机森林
Quant Biol. 2017 Dec;5(4):338-351. Epub 2017 Nov 6.
8
The Effect of Splitting on Random Forests.分裂对随机森林的影响。
Mach Learn. 2015 Apr;99(1):75-118. doi: 10.1007/s10994-014-5451-2. Epub 2014 Jul 2.
9
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.基于随机森林的用于特征选择和参数优化的CURE-SMOTE算法及混合算法。
BMC Bioinformatics. 2017 Mar 14;18(1):169. doi: 10.1186/s12859-017-1578-z.
10
Pathway analysis using random forests with bivariate node-split for survival outcomes.使用随机森林进行生存结局的双变量节点分裂的通路分析。
Bioinformatics. 2010 Jan 15;26(2):250-8. doi: 10.1093/bioinformatics/btp640. Epub 2009 Nov 18.

引用本文的文献

1
[Research and application implementation of the Internet of Things scheme for intensive care unit medical equipment].重症监护病房医疗设备物联网方案的研究与应用实施
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2025 Feb 25;42(1):65-72. doi: 10.7507/1001-5515.202411025.
2
eNSMBL-PASD: Spearheading early autism spectrum disorder detection through advanced genomic computational frameworks utilizing ensemble learning models.欧洲生物信息学研究所自闭症谱系障碍预测分析系统(eNSMBL-PASD):通过利用集成学习模型的先进基因组计算框架引领早期自闭症谱系障碍检测。
Digit Health. 2025 Jan 27;11:20552076241313407. doi: 10.1177/20552076241313407. eCollection 2025 Jan-Dec.
3

本文引用的文献

1
COVID-19 Patient Health Prediction Using Boosted Random Forest Algorithm.使用增强随机森林算法预测 COVID-19 患者的健康状况。
Front Public Health. 2020 Jul 3;8:357. doi: 10.3389/fpubh.2020.00357. eCollection 2020.
2
Examining the effect of social distancing on the compound growth rate of COVID-19 at the county level (United States) using statistical analyses and a random forest machine learning model.利用统计分析和随机森林机器学习模型,考察县级(美国)层面社交距离对 COVID-19 复合增长率的影响。
Public Health. 2020 Aug;185:27-29. doi: 10.1016/j.puhe.2020.04.016. Epub 2020 Apr 28.
3
Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal.
A novel multi-model feature generation technique for suicide detection.
一种用于自杀检测的新型多模型特征生成技术。
PeerJ Comput Sci. 2024 Oct 28;10:e2301. doi: 10.7717/peerj-cs.2301. eCollection 2024.
4
PDRF-Net: a progressive dense residual fusion network for COVID-19 lung CT image segmentation.PDRF-Net:一种用于COVID-19肺部CT图像分割的渐进式密集残差融合网络。
Evol Syst (Berl). 2023 Feb 17:1-17. doi: 10.1007/s12530-023-09489-x.
5
Prediction of the occurrence of leprosy reactions based on Bayesian networks.基于贝叶斯网络的麻风反应发生预测
Front Med (Lausanne). 2023 Jul 26;10:1233220. doi: 10.3389/fmed.2023.1233220. eCollection 2023.
6
Exploitation of surrogate variables in random forests for unbiased analysis of mutual impact and importance of features.利用随机森林中的替代变量进行无偏分析,以了解特征之间的相互影响和重要性。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad471.
COVID-19 诊断和预后预测模型:系统评价和批判性评估。
BMJ. 2020 Apr 7;369:m1328. doi: 10.1136/bmj.m1328.
4
On the overestimation of random forest's out-of-bag error.随机森林的袋外误差高估问题。
PLoS One. 2018 Aug 6;13(8):e0201904. doi: 10.1371/journal.pone.0201904. eCollection 2018.
5
Random forest versus logistic regression: a large-scale benchmark experiment.随机森林与逻辑回归:大规模基准实验。
BMC Bioinformatics. 2018 Jul 17;19(1):270. doi: 10.1186/s12859-018-2264-5.
6
Random forests of interaction trees for estimating individualized treatment effects in randomized trials.随机交互树森林用于估计随机临床试验中的个体化治疗效果。
Stat Med. 2018 Jul 30;37(17):2547-2560. doi: 10.1002/sim.7660. Epub 2018 Apr 29.
7
Do little interactions get lost in dark random forests?微小的相互作用会在黑暗的随机森林中消失吗?
BMC Bioinformatics. 2016 Mar 31;17:145. doi: 10.1186/s12859-016-0995-8.
8
Bias in random forest variable importance measures: illustrations, sources and a solution.随机森林变量重要性度量中的偏差:示例、来源及解决方案
BMC Bioinformatics. 2007 Jan 25;8:25. doi: 10.1186/1471-2105-8-25.