Møller Bjørn, Weedon-Fekjaer Harald, Haldorsen Tor
Cancer Registry of Norway, Montebello, N-0310 Oslo, Norway.
BMC Med Res Methodol. 2005 Jun 10;5:21. doi: 10.1186/1471-2288-5-21.
Prediction intervals can be calculated for predicting cancer incidence on the basis of a statistical model. These intervals include the uncertainty of the parameter estimates and variations in future rates but do not include the uncertainty of assumptions, such as continuation of current trends. In this study we evaluated whether prediction intervals are useful in practice.
Rates for the period 1993-97 were predicted from cancer incidence rates in the five Nordic countries for the period 1958-87. In a Poisson regression model, 95% prediction intervals were constructed for 200 combinations of 20 cancer types for males and females in the five countries. The coverage level was calculated as the proportion of the prediction intervals that covered the observed number of cases in 1993-97.
Overall, 52% (104/200) of the prediction intervals covered the observed numbers. When the prediction intervals were divided into quartiles according to the number of cases in the last observed period, the coverage level was inversely proportional to the frequency (84%, 52%, 46% and 26%). The coverage level varied widely among the five countries, but the difference declined after adjustment for the number of cases in each country.
The coverage level of prediction intervals strongly depended on the number of cases on which the predictions were based. As the sample size increased, uncertainty about the adequacy of the model dominated, and the coverage level fell far below 95%. Prediction intervals for cancer incidence must therefore be interpreted with caution.
可以基于统计模型计算预测区间以预测癌症发病率。这些区间包括参数估计的不确定性和未来发病率的变化,但不包括假设的不确定性,例如当前趋势的延续。在本研究中,我们评估了预测区间在实际应用中是否有用。
根据1958 - 1987年北欧五国的癌症发病率预测1993 - 1997年的发病率。在泊松回归模型中,为五个国家男性和女性的20种癌症类型的200种组合构建了95%的预测区间。覆盖水平计算为1993 - 1997年覆盖观察到的病例数的预测区间的比例。
总体而言,52%(104/200)的预测区间覆盖了观察到的病例数。当根据最后观察期的病例数将预测区间分为四分位数时,覆盖水平与频率成反比(84%、52%、46%和26%)。五个国家之间的覆盖水平差异很大,但在对每个国家的病例数进行调整后差异减小。
预测区间的覆盖水平强烈依赖于预测所基于的病例数。随着样本量的增加,模型充分性的不确定性占主导,覆盖水平远低于95%。因此,癌症发病率的预测区间必须谨慎解释。