• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机森林中加法结构的形式化假设检验。

Formal Hypothesis Tests for Additive Structure in Random Forests.

作者信息

Mentch Lucas, Hooker Giles

机构信息

Department of Statistical Science Cornell University.

出版信息

J Comput Graph Stat. 2017;26(3):589-597. doi: 10.1080/10618600.2016.1256817. Epub 2017 Apr 17.

DOI:10.1080/10618600.2016.1256817
PMID:30906174
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6428414/
Abstract

While statistical learning methods have proved powerful tools for predictive modeling, the black-box nature of the models they produce can severely limit their interpretability and the ability to conduct formal inference. However, the natural structure of ensemble learners like bagged trees and random forests has been shown to admit desirable asymptotic properties when base learners are built with proper subsamples. In this work, we demonstrate that by defining an appropriate grid structure on the covariate space, we may carry out formal hypothesis tests for both variable importance and underlying additive model structure. To our knowledge, these tests represent the first statistical tools for investigating the underlying regression structure in a context such as random forests. We develop notions of total and partial additivity and further demonstrate that testing can be carried out at no additional computational cost by estimating the variance within the process of constructing the ensemble. Furthermore, we propose a novel extension of these testing procedures utilizing random projections in order to allow for computationally efficient testing procedures that retain high power even when the grid size is much larger than that of the training set.

摘要

虽然统计学习方法已被证明是预测建模的强大工具,但它们所产生模型的黑箱性质会严重限制其可解释性以及进行形式推断的能力。然而,当使用适当的子样本构建基学习器时,像袋装树和随机森林这样的集成学习器的自然结构已被证明具有理想的渐近性质。在这项工作中,我们证明通过在协变量空间上定义适当的网格结构,我们可以对变量重要性和潜在的加性模型结构进行形式假设检验。据我们所知,这些检验代表了在诸如随机森林这样的背景下研究潜在回归结构的首批统计工具。我们提出了完全和部分可加性的概念,并进一步证明通过在构建集成的过程中估计方差,可以在不增加额外计算成本的情况下进行检验。此外,我们提出了这些检验程序的一种新颖扩展,利用随机投影,以便在网格大小远大于训练集大小时仍能实现计算高效的检验程序,同时保持高功效。

相似文献

1
Formal Hypothesis Tests for Additive Structure in Random Forests.随机森林中加法结构的形式化假设检验。
J Comput Graph Stat. 2017;26(3):589-597. doi: 10.1080/10618600.2016.1256817. Epub 2017 Apr 17.
2
The ensemble bridge algorithm: a new modeling tool for drug discovery problems.集成桥算法:一种新的药物发现问题建模工具。
J Chem Inf Model. 2010 Feb 22;50(2):309-16. doi: 10.1021/ci9003392.
3
Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.基于地质单元自动聚类的集成回归树改进室内氡浓度预测映射
J Environ Radioact. 2015 Sep;147:51-62. doi: 10.1016/j.jenvrad.2015.05.006. Epub 2015 May 28.
4
Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods.“管理我的疼痛”应用程序用户疼痛波动预测模型中的可解释性与类别不平衡:使用特征选择和多数投票方法的分析
JMIR Med Inform. 2019 Nov 20;7(4):e15601. doi: 10.2196/15601.
5
Tree-Weighting for Multi-Study Ensemble Learners.多研究集成学习者的树加权。
Pac Symp Biocomput. 2020;25:451-462.
6
Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法
Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.
7
Censoring Unbiased Regression Trees and Ensembles.审查无偏回归树与集成方法
J Am Stat Assoc. 2019;114(525):370-383. doi: 10.1080/01621459.2017.1407775. Epub 2018 Jul 9.
8
An experimental bias-variance analysis of SVM ensembles based on resampling techniques.
IEEE Trans Syst Man Cybern B Cybern. 2005 Dec;35(6):1252-71. doi: 10.1109/tsmcb.2005.850183.
9
The Random Forests statistical technique: An examination of its value for the study of reading.随机森林统计技术:对其在阅读研究中的价值审视
Sci Stud Read. 2016;20(1):20-33. doi: 10.1080/10888438.2015.1107073. Epub 2016 Jan 5.
10
A Very Simple Safe-Bayesian Random Forest.一种非常简单的安全贝叶斯随机森林。
IEEE Trans Pattern Anal Mach Intell. 2015 Jun;37(6):1297-303. doi: 10.1109/TPAMI.2014.2362751.

引用本文的文献

1
Provable Boolean interaction recovery from tree ensemble obtained via random forests.可证明的布尔交互作用从随机森林获得的树集成中恢复。
Proc Natl Acad Sci U S A. 2022 May 31;119(22):e2118636119. doi: 10.1073/pnas.2118636119. Epub 2022 May 24.
2
Physiological sleep measures predict time to 15-year mortality in community adults: Application of a novel machine learning framework.生理睡眠测量可预测社区成年人 15 年的死亡率:一种新型机器学习框架的应用。
J Sleep Res. 2021 Dec;30(6):e13386. doi: 10.1111/jsr.13386. Epub 2021 May 15.

本文引用的文献

1
Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.随机森林的置信区间:刀切法和无穷小刀切法
J Mach Learn Res. 2014 Jan;15(1):1625-1651.
2
Estimation and Accuracy after Model Selection.模型选择后的估计与准确性。
J Am Stat Assoc. 2014 Jul 1;109(507):991-1007. doi: 10.1080/01621459.2013.823775.