• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于限制随机森林模型树扩展的替代停止规则。

Alternative stopping rules to limit tree expansion for random forest models.

机构信息

Radiation Epidemiology Branch, National Cancer Institute, Bethesda, MD, 20892-9778, USA.

Radiation Epidemiology Branch, Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, 9609 Medical Center Drive, Bethesda, MD, 20892-9778, USA.

出版信息

Sci Rep. 2022 Sep 6;12(1):15113. doi: 10.1038/s41598-022-19281-7.

DOI:10.1038/s41598-022-19281-7
PMID:36068261
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9448733/
Abstract

Random forests are a popular type of machine learning model, which are relatively robust to overfitting, unlike some other machine learning models, and adequately capture non-linear relationships between an outcome of interest and multiple independent variables. There are relatively few adjustable hyperparameters in the standard random forest models, among them the minimum size of the terminal nodes on each tree. The usual stopping rule, as proposed by Breiman, stops tree expansion by limiting the size of the parent nodes, so that a node cannot be split if it has less than a specified number of observations. Recently an alternative stopping criterion has been proposed, stopping tree expansion so that all terminal nodes have at least a minimum number of observations. The present paper proposes three generalisations of this idea, limiting the growth in regression random forests, based on the variance, range, or inter-centile range. The new approaches are applied to diabetes data obtained from the National Health and Nutrition Examination Survey and four other datasets (Tasmanian Abalone data, Boston Housing crime rate data, Los Angeles ozone concentration data, MIT servo data). Empirical analysis presented herein demonstrate that the new stopping rules yield competitive mean square prediction error to standard random forest models. In general, use of the intercentile range statistic to control tree expansion yields much less variation in mean square prediction error, and mean square prediction error is also closer to the optimal. The Fortran code developed is provided in the Supplementary Material.

摘要

随机森林是一种流行的机器学习模型,与其他一些机器学习模型相比,它对过拟合具有较强的稳健性,并能充分捕捉感兴趣的结果与多个独立变量之间的非线性关系。标准随机森林模型中的可调超参数相对较少,其中包括每棵树终端节点的最小大小。通常的停止规则(由 Breiman 提出)通过限制父节点的大小来停止树的扩展,因此如果一个节点的观测值少于指定数量,则不能进行分割。最近提出了一种替代的停止标准,即停止树的扩展,以使所有终端节点至少具有指定数量的观测值。本文提出了三种基于方差、范围或分位数内距的这种思想的推广,用于限制回归随机森林的增长。新方法应用于从国家健康和营养检查调查以及其他四个数据集(塔斯马尼亚鲍鱼数据、波士顿住房犯罪率数据、洛杉矶臭氧浓度数据、麻省理工学院伺服数据)中获得的糖尿病数据。本文提出的实证分析表明,新的停止规则产生的均方预测误差与标准随机森林模型相当。一般来说,使用分位数范围统计量来控制树的扩展会导致均方预测误差的变化更小,并且均方预测误差也更接近最优。所开发的 Fortran 代码在补充材料中提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c332/9448733/16a6f1219acb/41598_2022_19281_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c332/9448733/16a6f1219acb/41598_2022_19281_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c332/9448733/16a6f1219acb/41598_2022_19281_Fig1_HTML.jpg

相似文献

1
Alternative stopping rules to limit tree expansion for random forest models.用于限制随机森林模型树扩展的替代停止规则。
Sci Rep. 2022 Sep 6;12(1):15113. doi: 10.1038/s41598-022-19281-7.
2
Effects of stopping criterion on the growth of trees in regression random forests.停止准则对回归随机森林中树木生长的影响。
N Engl J Stat Data Sci. 2023 Apr;1(1):46-61. doi: 10.51387/22-nejsds5. Epub 2022 Aug 31.
3
Oblique and rotation double random forest.倾斜和旋转双重随机森林。
Neural Netw. 2022 Sep;153:496-517. doi: 10.1016/j.neunet.2022.06.012. Epub 2022 Jun 18.
4
[Age estimation model for individual tree in natural forest based on random forest model].基于随机森林模型的天然林单株树木年龄估计模型
Ying Yong Sheng Tai Xue Bao. 2024 Apr 18;35(4):1055-1063. doi: 10.13287/j.1001-9332.202404.023.
5
A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.基于机器学习的数据驱动方法预测糖尿病和心血管疾病。
BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211. doi: 10.1186/s12911-019-0918-5.
6
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
7
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
8
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
9
Classification and Prediction on the Effects of Nutritional Intake on Overweight/Obesity, Dyslipidemia, Hypertension and Type 2 Diabetes Mellitus Using Deep Learning Model: 4-7th Korea National Health and Nutrition Examination Survey.使用深度学习模型对营养摄入对超重/肥胖、血脂异常、高血压和2型糖尿病影响的分类与预测:韩国第4 - 7次国民健康与营养检查调查
Int J Environ Res Public Health. 2021 May 24;18(11):5597. doi: 10.3390/ijerph18115597.
10
Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults.机器学习与衰老:以老年人严重跌倒伤害预测模型的开发为例。
J Gerontol A Biol Sci Med Sci. 2021 Mar 31;76(4):647-654. doi: 10.1093/gerona/glaa138.

引用本文的文献

1
Machine learning techniques for the prediction of indoor gamma-ray dose rates - Strengths, weaknesses and implications for epidemiology.用于预测室内伽马射线剂量率的机器学习技术——优势、劣势及对流行病学的影响
J Environ Radioact. 2025 Feb;282:107595. doi: 10.1016/j.jenvrad.2024.107595. Epub 2024 Dec 27.
2
Machine learning estimates on the impacts of detection times on wildfire suppression costs.机器学习对火灾扑救成本检测时间影响的估计。
PLoS One. 2024 Nov 20;19(11):e0313200. doi: 10.1371/journal.pone.0313200. eCollection 2024.
3
Risk Prediction Model for Non-Suicidal Self-Injury in Chinese Adolescents with Major Depressive Disorder Based on Machine Learning.

本文引用的文献

1
Effects of stopping criterion on the growth of trees in regression random forests.停止准则对回归随机森林中树木生长的影响。
N Engl J Stat Data Sci. 2023 Apr;1(1):46-61. doi: 10.51387/22-nejsds5. Epub 2022 Aug 31.
2
A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.多中心随机森林模型在协作临床研究网络中的有效预后预测。
Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.
3
BiMM forest: A random forest method for modeling clustered and longitudinal binary outcomes.
基于机器学习的中国青少年重度抑郁症非自杀性自伤风险预测模型
Neuropsychiatr Dis Treat. 2024 Aug 8;20:1539-1551. doi: 10.2147/NDT.S460021. eCollection 2024.
4
A Historical Survey of Key Epidemiological Studies of Ionizing Radiation Exposure.电离辐射暴露的关键流行病学研究的历史调查。
Radiat Res. 2024 Aug 1;202(2):432-487. doi: 10.1667/RADE-24-00021.1.
5
Effects of stopping criterion on the growth of trees in regression random forests.停止准则对回归随机森林中树木生长的影响。
N Engl J Stat Data Sci. 2023 Apr;1(1):46-61. doi: 10.51387/22-nejsds5. Epub 2022 Aug 31.
BiMM森林:一种用于对聚类和纵向二元结局进行建模的随机森林方法。
Chemometr Intell Lab Syst. 2019 Feb 15;185:122-134. doi: 10.1016/j.chemolab.2019.01.002. Epub 2019 Jan 11.
4
Model-Based Recursive Partitioning for Subgroup Analyses.用于亚组分析的基于模型的递归划分
Int J Biostat. 2016 May 1;12(1):45-63. doi: 10.1515/ijb-2015-0032.
5
A Very Simple Safe-Bayesian Random Forest.一种非常简单的安全贝叶斯随机森林。
IEEE Trans Pattern Anal Mach Intell. 2015 Jun;37(6):1297-303. doi: 10.1109/TPAMI.2014.2362751.
6
Random forest methodology for model-based recursive partitioning: the mobForest package for R.基于模型的递归分割的随机森林方法:R 中的 mobForest 包。
BMC Bioinformatics. 2013 Apr 11;14:125. doi: 10.1186/1471-2105-14-125.
7
Subgroup identification from randomized clinical trial data.随机临床试验数据中的亚组识别。
Stat Med. 2011 Oct 30;30(24):2867-80. doi: 10.1002/sim.4322. Epub 2011 Aug 4.
8
GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest.GeneSrF和varSelRF:一个用于基因选择和分类的基于网络的工具及R包,采用随机森林方法。
BMC Bioinformatics. 2007 Sep 3;8:328. doi: 10.1186/1471-2105-8-328.
9
Gene selection and classification of microarray data using random forest.使用随机森林进行微阵列数据的基因选择与分类
BMC Bioinformatics. 2006 Jan 6;7:3. doi: 10.1186/1471-2105-7-3.