Trächsel Bastien, Rousson Valentin, Bulliard Jean-Luc, Locatelli Isabella
Center for Primary Care and Public Health (Unisanté), University of Lausanne, Lausanne, Switzerland.
Biom J. 2023 Oct;65(7):e2200046. doi: 10.1002/bimj.202200046. Epub 2023 Apr 20.
This study compares the performance of statistical methods for predicting age-standardized cancer incidence, including Poisson generalized linear models, age-period-cohort (APC) and Bayesian age-period-cohort (BAPC) models, autoregressive integrated moving average (ARIMA) time series, and simple linear models. The methods are evaluated via leave-future-out cross-validation, and performance is assessed using the normalized root mean square error, interval score, and coverage of prediction intervals. Methods were applied to cancer incidence from the three Swiss cancer registries of Geneva, Neuchatel, and Vaud combined, considering the five most frequent cancer sites: breast, colorectal, lung, prostate, and skin melanoma and bringing all other sites together in a final group. Best overall performance was achieved by ARIMA models, followed by linear regression models. Prediction methods based on model selection using the Akaike information criterion resulted in overfitting. The widely used APC and BAPC models were found to be suboptimal for prediction, particularly in the case of a trend reversal in incidence, as it was observed for prostate cancer. In general, we do not recommend predicting cancer incidence for periods far into the future but rather updating predictions regularly.
本研究比较了预测年龄标准化癌症发病率的统计方法的性能,包括泊松广义线性模型、年龄-时期-队列(APC)模型和贝叶斯年龄-时期-队列(BAPC)模型、自回归积分移动平均(ARIMA)时间序列以及简单线性模型。这些方法通过留未来数据交叉验证进行评估,并使用归一化均方根误差、区间得分和预测区间覆盖率来评估性能。研究方法应用于日内瓦、纳沙泰尔和沃州这三个瑞士癌症登记处合并的癌症发病率数据,考虑了五个最常见的癌症部位:乳腺癌、结直肠癌、肺癌、前列腺癌和皮肤黑色素瘤,并将所有其他部位归为最后一组。ARIMA模型总体表现最佳,其次是线性回归模型。基于赤池信息准则进行模型选择的预测方法导致了过拟合。研究发现,广泛使用的APC和BAPC模型在预测方面并非最优,尤其是在发病率出现趋势逆转的情况下,如前列腺癌的情况。一般来说,我们不建议对远期癌症发病率进行预测,而是建议定期更新预测。