van der Steen Alex, van Rosmalen Joost, Kroep Sonja, van Hees Frank, Steyerberg Ewout W, de Koning Harry J, van Ballegooijen Marjolein, Lansdorp-Vogelaar Iris
Departments of Public Health, Erasmus MC, Rotterdam, The Netherlands (AvdS, SK, FvH, EWS, HJdK, MvB, IL-V),
Biostatistics, Erasmus MC, Rotterdam, The Netherlands (JvR)
Med Decis Making. 2016 Jul;36(5):652-65. doi: 10.1177/0272989X16636851. Epub 2016 Mar 8.
Calibration (estimation of model parameters) compares model outcomes with observed outcomes and explores possible model parameter values to identify the set of values that provides the best fit to the data. The goodness-of-fit (GOF) criterion quantifies the difference between model and observed outcomes. There is no consensus on the most appropriate GOF criterion, because a direct performance comparison of GOF criteria in model calibration is lacking.
We systematically compared the performance of commonly used GOF criteria (sum of squared errors [SSE], Pearson chi-square, and a likelihood-based approach [Poisson and/or binomial deviance functions]) in the calibration of selected parameters of the MISCAN-Colon microsimulation model for colorectal cancer. The performance of each GOF criterion was assessed by comparing the 1) root mean squared prediction error (RMSPE) of the selected parameters, 2) computation time of the calibration procedure of various calibration scenarios, and 3) impact on estimated cost-effectiveness ratios.
The likelihood-based deviance resulted in the lowest RMSPE in 4 of 6 calibration scenarios and was close to best in the other 2. The SSE had a 25 times higher RMSPE in a scenario with considerable differences in the values of observed outcomes, whereas the Pearson chi-square had a 60 times higher RMSPE in a scenario with multiple studies measuring the same outcome. In all scenarios, the SSE required the most computation time. The likelihood-based approach estimated the cost-effectiveness ratio most accurately (up to -0.15% relative difference versus 0.44% [SSE] and 13% [Pearson chi-square]).
The likelihood-based deviance criteria lead to accurate estimation of parameters under various circumstances. These criteria are recommended for calibration in microsimulation disease models in contrast with other commonly used criteria.
校准(模型参数估计)将模型结果与观察到的结果进行比较,并探索可能的模型参数值,以确定最能拟合数据的一组值。拟合优度(GOF)标准量化了模型结果与观察到的结果之间的差异。由于缺乏在模型校准中对GOF标准进行直接性能比较,因此对于最合适的GOF标准尚无共识。
我们系统地比较了常用GOF标准(平方误差和[SSE]、Pearson卡方检验以及基于似然性的方法[泊松和/或二项式偏差函数])在MISCAN - Colon结直肠癌微观模拟模型选定参数校准中的性能。通过比较以下方面来评估每个GOF标准的性能:1)选定参数的均方根预测误差(RMSPE);2)各种校准场景下校准程序的计算时间;3)对估计的成本效益比的影响。
在6个校准场景中的4个场景中,基于似然性的偏差导致最低的RMSPE,在其他2个场景中接近最佳。在观察到的结果值存在相当大差异的场景中,SSE的RMSPE高25倍,而在多项研究测量相同结果的场景中,Pearson卡方检验的RMSPE高60倍。在所有场景中,SSE所需的计算时间最长。基于似然性的方法最准确地估计了成本效益比(相对差异高达 - 0.15%,而SSE为0.44%,Pearson卡方检验为13%)。
基于似然性的偏差标准可在各种情况下准确估计参数。与其他常用标准相比,推荐这些标准用于微观模拟疾病模型的校准。