Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands.
Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands.
BMC Med Res Methodol. 2023 Feb 24;23(1):51. doi: 10.1186/s12874-023-01866-z.
In health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting).
A dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years.
Based on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated.
Overall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed.
在健康研究中,几种慢性病易受竞争风险(CRs)的影响。最初,统计模型(SM)被开发用于在存在 CRs 的情况下估计事件的累积发生率。由于最近人们对应用机器学习(ML)进行临床预测越来越感兴趣,这些技术也已扩展到 CRs 模型,但文献有限。在这里,我们的目的是研究 ML 与 SM 在非复杂数据(小/中样本量,低维设置)中对 CRs 的潜在作用。
使用包含 3826 名回顾性收集的四肢软组织肉瘤(eSTS)患者和 9 个预测因子的数据集,根据区分度和校准度评估模型预测性能。在简单的临床环境中,比较了两种 SM(原因特异性 Cox、Fine-Gray)和三种 ML 技术对 CRs 的预测。ML 模型包括用于 CRs 的原始部分逻辑人工神经网络(PLANNCR 原始)、在架构方面具有新规格的 PLANNCR(PLANNCR 扩展)和用于 CRs 的随机生存森林(RSFCR)。临床终点是手术和疾病进展(感兴趣事件)或死亡(竞争事件)之间的年时间。感兴趣的时间点为 2、5 和 10 年。
基于原始 eSTS 数据,绘制了 100 个 bootstrap 训练数据集。通过使用 Brier 评分和曲线下面积(AUC)与 CRs 评估最终模型在验证数据(排除样本)上的性能。还估计了校准误差(绝对精度误差)。结果表明,在 2、5 和 10 年时,ML 模型在 Brier 评分和 AUC(95%置信区间重叠)方面能够达到与 SM 相当的性能。然而,SM 通常更能校准。
总体而言,ML 技术的实用性较差,因为它们需要大量的实施时间(数据预处理、超参数调整、计算强度),而回归方法可以在没有模型训练额外工作量的情况下表现良好。因此,对于非复杂的实际生存数据,这些技术仅应作为 SM 的补充,作为模型性能的探索工具。迫切需要更多关注模型校准。