Liu Peiran, Raftery Adrian E
Peiran Liu is Ph.D. Student, Department of Statistics, University of Washington, Seattle.
Adrian E. Raftery is Boeing International Professor of Statistics and Sociology, Department of Statistics, University of Washington, Seattle.
Ann Appl Stat. 2020 Jun;14(2):685-705. doi: 10.1214/19-aoas1294. Epub 2020 Jun 29.
Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. The estimates of these models and the resulting projections are conditional on the UN's official estimates of past values. However, these past values are themselves uncertain, particularly for the majority of the world's countries that do not have longstanding high-quality vital registration systems, when they rely on surveys and censuses with their own biases and measurement errors. This paper extends the UN model for projecting future total fertility rates to take account of uncertainty about past values. This is done by adding an additional level to the hierarchical model to represent the multiple data sources, in each case estimating their bias and measurement error variance. We assess the method by out-of-sample predictive validation. While the prediction intervals produced by the extant method (which does not account for this source of uncertainty) have somewhat less than nominal coverage, we find that our proposed method achieves closer to nominal coverage. The prediction intervals become wider for countries for which the estimates of past total fertility rates rely heavily on surveys rather than on vital registration data, especially in high fertility countries.
自20世纪40年代以来,大多数情况下人口预测是使用确定性队列成分法得出的。然而,2015年,联合国取得了一项重大进展,首次基于总生育率和预期寿命的贝叶斯分层模型发布了所有国家的官方概率人口预测。这些模型的估计值以及由此得出的预测值是以联合国对过去数据的官方估计为条件的。然而,这些过去的数据本身就存在不确定性,特别是对于世界上大多数没有长期高质量生命登记系统的国家而言,在这种情况下,它们依赖的是存在自身偏差和测量误差的调查和人口普查数据。本文扩展了联合国预测未来总生育率的模型,以考虑过去数据的不确定性。这是通过在分层模型中增加一个额外层次来表示多个数据源实现的,在每种情况下估计其偏差和测量误差方差。我们通过样本外预测验证来评估该方法。虽然现有方法(未考虑这种不确定性来源)产生的预测区间的覆盖范围略低于名义覆盖范围,但我们发现我们提出的方法实现了更接近名义覆盖范围的结果。对于过去总生育率估计严重依赖调查而非生命登记数据的国家,尤其是高生育率国家,预测区间会变宽。