Department of Statistics and Data Science, Korea National Open University, Seoul, Korea.
Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea.
Korean J Radiol. 2021 Oct;22(10):1697-1707. doi: 10.3348/kjr.2021.0223. Epub 2021 Jul 1.
The recent introduction of various high-dimensional modeling methods, such as radiomics and deep learning, has created a much greater diversity in modeling approaches for survival prediction (or, more generally, time-to-event prediction). The newness of the recent modeling approaches and unfamiliarity with the model outputs may confuse some researchers and practitioners about the evaluation of the performance of such models. Methodological literacy to critically appraise the performance evaluation of the models and, ideally, the ability to conduct such an evaluation would be needed for those who want to develop models or apply them in practice. This article intends to provide intuitive, conceptual, and practical explanations of the statistical methods for evaluating the performance of survival prediction models with minimal usage of mathematical descriptions. It covers from conventional to deep learning methods, and emphasis has been placed on recent modeling approaches. This review article includes straightforward explanations of C indices (Harrell's C index, etc.), time-dependent receiver operating characteristic curve analysis, calibration plot, other methods for evaluating the calibration performance, and Brier score.
近年来,各种高维建模方法(如放射组学和深度学习)的引入,使得生存预测(或更一般地说,事件时间预测)的建模方法更加多样化。由于最近建模方法的新颖性以及对模型输出的不熟悉,可能会使一些研究人员和从业者对这些模型性能评估感到困惑。对于那些希望开发模型或在实践中应用模型的人来说,需要具备批判性评估模型性能评估的方法素养,并且理想情况下,还需要具备进行此类评估的能力。本文旨在提供直观、概念性和实践性的解释,使用最小的数学描述来评估生存预测模型的性能。它涵盖了从传统到深度学习的方法,并重点介绍了最近的建模方法。这篇综述文章包括对 C 指数(Harrell 的 C 指数等)、时间依赖性接受者操作特征曲线分析、校准图、其他校准性能评估方法和 Brier 得分的简单解释。