Institute of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany.
DICE Group, Department of Computer Science, Paderborn University, Paderborn, Germany.
Biom J. 2024 Sep;66(6):e202400014. doi: 10.1002/bimj.202400014.
Random survival forests (RSF) can be applied to many time-to-event research questions and are particularly useful in situations where the relationship between the independent variables and the event of interest is rather complex. However, in many clinical settings, the occurrence of the event of interest is affected by competing events, which means that a patient can experience an outcome other than the event of interest. Neglecting the competing event (i.e., regarding competing events as censoring) will typically result in biased estimates of the cumulative incidence function (CIF). A popular approach for competing events is Fine and Gray's subdistribution hazard model, which directly estimates the CIF by fitting a single-event model defined on a subdistribution timescale. Here, we integrate concepts from the subdistribution hazard modeling approach into the RSF. We develop several imputation strategies that use weights as in a discrete-time subdistribution hazard model to impute censoring times in cases where a competing event is observed. Our simulations show that the CIF is well estimated if the imputation already takes place outside the forest on the overall dataset. Especially in settings with a low rate of the event of interest or a high censoring rate, competing events must not be neglected, that is, treated as censoring. When applied to a real-world epidemiological dataset on chronic kidney disease, the imputation approach resulted in highly plausible predictor-response relationships and CIF estimates of renal events.
随机生存森林(RSF)可应用于许多事件时间研究问题,尤其适用于独立变量与感兴趣事件之间的关系相当复杂的情况。然而,在许多临床环境中,感兴趣事件的发生受到竞争事件的影响,这意味着患者可能会经历感兴趣事件以外的结果。忽略竞争事件(即,将竞争事件视为删失)通常会导致累积发生率函数(CIF)的估计值存在偏差。一种用于竞争事件的流行方法是 Fine 和 Gray 的亚分布风险模型,该模型通过拟合定义在亚分布时间尺度上的单个事件模型,直接估计 CIF。在这里,我们将亚分布风险模型方法的概念集成到 RSF 中。我们开发了几种插补策略,这些策略使用权重,类似于离散时间亚分布风险模型,在观察到竞争事件的情况下对删失时间进行插补。我们的模拟结果表明,如果插补已经在整个数据集上的森林之外进行,则 CIF 可以得到很好的估计。特别是在感兴趣事件的发生率较低或删失率较高的情况下,不能忽略竞争事件,即,应将其视为删失。当应用于关于慢性肾病的真实世界流行病学数据集时,该插补方法产生了高度合理的预测器-响应关系和肾事件的 CIF 估计。