Health e-Research Centre, School of Health Sciences, Faculty of Biology, Medicine and Health, the University of Manchester, Manchester, Oxford Road, Manchester, M13 9PL, UK; School of Mathematical Sciences, Xiamen University, Xiamen, 361005, People's Republic of China.
Health e-Research Centre, School of Health Sciences, Faculty of Biology, Medicine and Health, the University of Manchester, Manchester, Oxford Road, Manchester, M13 9PL, UK.
J Clin Epidemiol. 2021 Oct;138:168-177. doi: 10.1016/j.jclinepi.2021.06.026. Epub 2021 Jul 3.
Clinical risk prediction models are generally assessed on population level with a lack of measures that evaluate their stability at predicting risks of individual patients. This study evaluated the use of ranking as a measure to assess individual level stability between risk prediction models.
A large patient cohort (3.66 million patients with 0.11 million cardiovascular events) extracted from the Clinical Practice Research Datalink was used in the exemplar of cardiovascular disease risk prediction.
It was found that 15 models (including machine learning and statistical models) had similar population-level model performance (C statistics about 0.88). For patients with high absolute risks, the models were more consistent in ranking of risk predictions (interquartile range (IQR) of differences in rank percentiles -0.6 to 1.0), but inconsistent in absolute risk (IQR of differences in absolute risk -18.8 to 9.0). At low risk, the reverse was true with inconsistent ranking but more consistent absolute risk.
Consistency of ranking of individual risk predictions is a useful measure to assess risk prediction models providing complementary information to absolute risk stability. Model developing guidelines including "TRIPOD" and "PROBAST" should incorporate ranking to assess individual level stability between risk prediction models.
临床风险预测模型通常在人群水平上进行评估,缺乏评估其预测个体患者风险稳定性的措施。本研究评估了排名作为评估风险预测模型个体水平稳定性的一种度量方法。
从临床实践研究数据链接中提取了一个包含 366 万患者和 11 万心血管事件的大型患者队列,用于心血管疾病风险预测的范例。
发现 15 个模型(包括机器学习和统计模型)具有相似的人群水平模型性能(C 统计量约为 0.88)。对于绝对风险较高的患者,模型在风险预测的排名上更为一致(排名百分位数差异的四分位距为 0.6 至 1.0),但在绝对风险上不一致(绝对风险差异的四分位距为-18.8 至 9.0)。在低风险情况下,情况正好相反,排名不一致但绝对风险更一致。
个体风险预测的排名一致性是评估风险预测模型的有用度量方法,提供了与绝对风险稳定性互补的信息。包括“TRIPOD”和“PROBAST”在内的模型开发指南应纳入排名,以评估风险预测模型之间的个体水平稳定性。