Jaeger Byron C, Long D Leann, Long Dustin M, Sims Mario, Szychowski Jeff M, Min Yuan-I, Mcclure Leslie A, Howard George, Simon Noah
University of Alabama at Birmingham.
University of Mississippi Medical Center.
Ann Appl Stat. 2019 Sep;13(3):1847-1883. doi: 10.1214/19-aoas1261. Epub 2019 Oct 17.
We introduce and evaluate the oblique random survival forest (ORSF). The ORSF is an ensemble method for right-censored survival data that uses linear combinations of input variables to recursively partition a set of training data. Regularized Cox proportional hazard models are used to identify linear combinations of input variables in each recursive partitioning step. Benchmark results using simulated and real data indicate that the ORSF's predicted risk function has high prognostic value in comparison to random survival forests, conditional inference forests, regression, and boosting. In an application to data from the Jackson Heart Study, we demonstrate variable and partial dependence using the ORSF and highlight characteristics of its 10-year predicted risk function for atherosclerotic cardiovascular disease events (ASCVD; stroke, coronary heart disease). We present visualizations comparing variable and partial effect estimation according to the ORSF, the conditional inference forest, and the Pooled Cohort Risk equations. The obliqueRSF R package, which provides functions to fit the ORSF and create variable and partial dependence plots, is available on the comprehensive R archive network (CRAN).
我们介绍并评估了斜向随机生存森林(ORSF)。ORSF是一种用于右删失生存数据的集成方法,它使用输入变量的线性组合来递归划分一组训练数据。在每个递归划分步骤中,使用正则化的Cox比例风险模型来识别输入变量的线性组合。使用模拟数据和真实数据的基准结果表明,与随机生存森林、条件推断森林、回归和提升方法相比,ORSF的预测风险函数具有较高的预后价值。在对杰克逊心脏研究数据的应用中,我们使用ORSF展示了变量和偏倚依赖性,并突出了其针对动脉粥样硬化性心血管疾病事件(ASCVD;中风、冠心病)的10年预测风险函数的特征。我们展示了根据ORSF、条件推断森林和汇总队列风险方程进行变量和偏倚效应估计比较的可视化结果。提供拟合ORSF以及创建变量和偏倚依赖性图功能的obliqueRSF R包可在综合R存档网络(CRAN)上获取。