School of Statistics and Actuarial Science, University of the Witwatersrand, Johannesburg, Gauteng, South Africa.
School of Natural and Applied Sciences, Sol Plaatje University, Kimberley, Northern Cape, South Africa.
PLoS One. 2022 Dec 28;17(12):e0279435. doi: 10.1371/journal.pone.0279435. eCollection 2022.
Research that seeks to compare two predictive models requires a thorough statistical approach to draw valid inferences about comparisons between the performance of the two models. Researchers present estimates of model performance with little evidence on whether they reflect true differences in model performance. In this study, we apply two statistical tests, that is, the 5 × 2-fold cv paired t-test, and the combined 5 × 2-fold cv F-test to provide statistical evidence on differences in predictive performance between the Fine-Gray (FG) and random survival forest (RSF) models for competing risks. These models are trained on different scenarios of low-dimensional simulated survival data to determine whether the differences in their predictive performance that exist are indeed significant. Each simulation was repeated one hundred times on ten different seeds. The results indicate that the RSF model is superior in predictive performance in the presence of complex relationships (quadratic and interactions) between the outcome and its predictors. The two statistical tests show that the differences in performance are significant in quadratic simulation but not significant in interaction simulations. The study has also revealed that the FG model is superior in predictive performance in linear simulations and its differences in predictive performance compared to the RSF model are significant. The combined 5 × 2-fold cv F-test has lower type I error rates compared to the 5 × 2-fold cv paired t-test.
研究旨在比较两个预测模型时,需要采用全面的统计方法,才能对两个模型性能之间的比较做出有效的推断。研究人员报告了模型性能的估计值,但几乎没有证据表明这些估计值反映了模型性能的真实差异。在这项研究中,我们应用了两种统计检验方法,即 5×2 折交叉验证配对 t 检验和组合的 5×2 折交叉验证 F 检验,为 Fine-Gray(FG)和随机生存森林(RSF)模型在竞争风险中的预测性能差异提供统计证据。这些模型在不同的低维模拟生存数据场景中进行训练,以确定它们在预测性能上的差异是否确实显著。每个模拟在十个不同的种子上重复一百次。结果表明,在结局与其预测因子之间存在复杂关系(二次和交互作用)的情况下,RSF 模型在预测性能方面更优。这两种统计检验表明,在二次模拟中,性能差异显著,但在交互作用模拟中不显著。研究还表明,在线性模拟中,FG 模型在预测性能方面更优,与 RSF 模型相比,其预测性能差异显著。与 5×2 折交叉验证配对 t 检验相比,组合的 5×2 折交叉验证 F 检验具有更低的Ⅰ类错误率。