Suppr超能文献

递归分区介绍:分类和回归树、装袋和随机森林的原理、应用和特点。

An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.

机构信息

Department of Statistics, Ludwig-Maximilians-Universität Munich, Munich, Germany.

出版信息

Psychol Methods. 2009 Dec;14(4):323-48. doi: 10.1037/a0016973.

Abstract

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and bioinformatics within the past few years. High-dimensional problems are common not only in genetics, but also in some areas of psychological research, where only a few subjects can be measured because of time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications and to provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high-dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated with freely available implementations in the R system for statistical computing.

摘要

递归分区方法已成为许多科学领域中非参数回归和分类的流行且广泛使用的工具。尤其是随机森林,即使在存在复杂相互作用的情况下,也可以处理大量预测变量,近年来已成功应用于遗传学、临床医学和生物信息学。高维问题不仅在遗传学中很常见,在某些心理学研究领域也是如此,由于时间或成本的限制,只能对少数几个主体进行测量,但每个主体都会生成大量的数据。随机森林已被证明在这些应用中可以实现高精度的预测,并提供描述性的变量重要性度量,反映每个变量在主效应和相互作用中的影响。这项工作的目的是介绍标准递归分区方法的原理和最近的方法改进,说明它们在低维和高维数据探索中的使用,但也要指出方法的局限性和实际应用中的潜在陷阱。该方法的应用通过统计计算的 R 系统中的免费实现来举例说明。

相似文献

7
Conditional variable importance for random forests.随机森林的条件变量重要性
BMC Bioinformatics. 2008 Jul 11;9:307. doi: 10.1186/1471-2105-9-307.
8
Rationale and Applications of Survival Tree and Survival Ensemble Methods.生存树和生存集成方法的原理与应用
Psychometrika. 2015 Sep;80(3):811-33. doi: 10.1007/s11336-014-9413-1. Epub 2014 Sep 17.

引用本文的文献

5
Machine Learning-Assisted Design of Molecular Structure of Diphenylamine Antioxidants.机器学习辅助二苯胺抗氧化剂分子结构设计
ACS Omega. 2025 Jul 23;10(30):33063-33078. doi: 10.1021/acsomega.5c02343. eCollection 2025 Aug 5.
9
Artificial intelligence to predict hepatocellular carcinoma risk in cirrhosis.人工智能预测肝硬化患者肝细胞癌风险
World J Gastrointest Oncol. 2025 Jun 15;17(6):107414. doi: 10.4251/wjgo.v17.i6.107414.

本文引用的文献

1
Evaluating microarray-based classifiers: an overview.评估基于微阵列的分类器:综述。
Cancer Inform. 2008;6:77-97. doi: 10.4137/cin.s408. Epub 2008 Feb 29.
3
Conditional variable importance for random forests.随机森林的条件变量重要性
BMC Bioinformatics. 2008 Jul 11;9:307. doi: 10.1186/1471-2105-9-307.
6
Variables associated with familial suicide attempts in a sample of suicide attempters.自杀未遂者样本中与家族性自杀未遂相关的变量。
Prog Neuropsychopharmacol Biol Psychiatry. 2007 Aug 15;31(6):1312-6. doi: 10.1016/j.pnpbp.2007.05.019. Epub 2007 Jun 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验