Suppr超能文献

加速且可解释的斜向随机生存森林

Accelerated and Interpretable Oblique Random Survival Forests.

作者信息

Jaeger Byron C, Welden Sawyer, Lenoir Kristin, Speiser Jaime L, Segar Matthew W, Pandey Ambarish, Pajewski Nicholas M

机构信息

Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC.

Department of Cardiology, Texas Heart Institute, Houston, TX.

出版信息

J Comput Graph Stat. 2024;33(1):192-207. doi: 10.1080/10618600.2023.2231048.

Abstract

The oblique random survival forest (RSF) is an ensemble supervised learning method for right-censored outcomes. Trees in the oblique RSF are grown using linear combinations of predictors, whereas in the standard RSF, a single predictor is used. Oblique RSF ensembles have high prediction accuracy, but assessing many linear combinations of predictors induces high computational overhead. In addition, few methods have been developed for estimation of variable importance (VI) with oblique RSFs. We introduce a method to increase computational efficiency of the oblique RSF and a method to estimate VI with the oblique RSF. Our computational approach uses Newton-Raphson scoring in each non-leaf node, We estimate VI by negating each coefficient used for a given predictor in linear combinations, and then computing the reduction in out-of-bag accuracy. In benchmarking experiments, we find our implementation of the oblique RSF is hundreds of times faster, with equivalent prediction accuracy, compared to existing software for oblique RSFs. We find in simulation studies that "negation VI" discriminates between relevant and irrelevant numeric predictors more accurately than permutation VI, Shapley VI, and a technique to measure VI using analysis of variance. All oblique RSF methods in the current study are available in the aorsf R package, and additional supplemental materials are available online.

摘要

斜向随机生存森林(RSF)是一种用于处理删失结局的集成监督学习方法。斜向RSF中的树是通过预测变量的线性组合来生长的,而在标准RSF中,只使用单个预测变量。斜向RSF集成具有较高的预测准确性,但评估预测变量的许多线性组合会带来较高的计算开销。此外,针对斜向RSF的变量重要性(VI)估计方法很少。我们介绍了一种提高斜向RSF计算效率的方法以及一种用斜向RSF估计VI的方法。我们的计算方法在每个非叶节点使用牛顿-拉弗森评分,我们通过对线性组合中给定预测变量使用的每个系数取反,然后计算袋外准确率的降低来估计VI。在基准实验中,我们发现与现有的斜向RSF软件相比,我们实现的斜向RSF快数百倍,且预测准确性相当。我们在模拟研究中发现,“取反VI”比排列VI、沙普利VI以及使用方差分析测量VI的技术能更准确地区分相关和不相关的数值预测变量。当前研究中的所有斜向RSF方法都可在aorsf R包中获取,并且在线提供了额外的补充材料。

相似文献

1
Accelerated and Interpretable Oblique Random Survival Forests.加速且可解释的斜向随机生存森林
J Comput Graph Stat. 2024;33(1):192-207. doi: 10.1080/10618600.2023.2231048.
5
Oblique and rotation double random forest.倾斜和旋转双重随机森林。
Neural Netw. 2022 Sep;153:496-517. doi: 10.1016/j.neunet.2022.06.012. Epub 2022 Jun 18.
7
OBLIQUE RANDOM SURVIVAL FORESTS.倾斜随机生存森林
Ann Appl Stat. 2019 Sep;13(3):1847-1883. doi: 10.1214/19-aoas1261. Epub 2019 Oct 17.

引用本文的文献

本文引用的文献

1
OBLIQUE RANDOM SURVIVAL FORESTS.倾斜随机生存森林
Ann Appl Stat. 2019 Sep;13(3):1847-1883. doi: 10.1214/19-aoas1261. Epub 2019 Oct 17.
7
Random survival forest with space extensions for censored data.用于删失数据的具有空间扩展的随机生存森林
Artif Intell Med. 2017 Jun;79:52-61. doi: 10.1016/j.artmed.2017.06.005. Epub 2017 Jun 20.
9
Random rotation survival forest for high dimensional censored data.用于高维删失数据的随机旋转生存森林
Springerplus. 2016 Aug 26;5(1):1425. doi: 10.1186/s40064-016-3113-5. eCollection 2016.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验