Maringe Camille, Belot Aurélien, Rachet Bernard
Department of Non-Communicable Disease Epidemiology, London School of Hygiene & Tropical Medicine, Keppel Street, London, UK.
Stat Methods Med Res. 2020 Dec;29(12):3605-3622. doi: 10.1177/0962280220934501.
Despite a large choice of models, functional forms and types of effects, the selection of excess hazard models for prediction of population cancer survival is not widespread in the literature. We propose multi-model inference based on excess hazard model(s) selected using Akaike information criteria or Bayesian information criteria for prediction and projection of cancer survival. We evaluate the properties of this approach using empirical data of patients diagnosed with breast, colon or lung cancer in 1990-2011. We artificially censor the data on 31 December 2010 and predict five-year survival for the 2010 and 2011 cohorts. We compare these predictions to the observed five-year cohort estimates of cancer survival and contrast them to predictions from an a priori selected simple model, and from the period approach. We illustrate the approach by replicating it for cohorts of patients for which stage at diagnosis and other important prognosis factors are available. We find that model-averaged predictions and projections of survival have close to minimal differences with the Pohar-Perme estimation of survival in many instances, particularly in subgroups of the population. Advantages of information-criterion based model selection include (i) transparent model-building strategy, (ii) accounting for model selection uncertainty, (iii) no a priori assumption for effects, and (iv) projections for patients outside of the sample.
尽管在模型、函数形式和效应类型方面有多种选择,但在文献中,用于预测人群癌症生存率的超额风险模型的选择并不普遍。我们提出基于使用赤池信息准则或贝叶斯信息准则选择的超额风险模型进行多模型推断,以预测和推断癌症生存率。我们使用1990 - 2011年诊断为乳腺癌、结肠癌或肺癌患者的经验数据来评估这种方法的性质。我们在2010年12月31日对数据进行人为删失,并预测2010年和2011年队列的五年生存率。我们将这些预测结果与观察到的癌症生存率的五年队列估计值进行比较,并将它们与先验选择的简单模型和时期方法的预测结果进行对比。我们通过对诊断时分期和其他重要预后因素可用的患者队列重复该方法来说明这一方法。我们发现,在许多情况下,尤其是在人群亚组中,生存的模型平均预测和推断与波哈尔 - 佩尔梅生存估计值的差异接近最小。基于信息准则的模型选择的优点包括:(i)透明的模型构建策略;(ii)考虑模型选择的不确定性;(iii)对效应无先验假设;(iv)对样本外患者的推断。