Chen Tong, Lumley Thomas
Department of Statistics, University of Auckland, Auckland, New Zealand.
Stat Med. 2020 Dec 30;39(30):4912-4921. doi: 10.1002/sim.8760. Epub 2020 Oct 5.
Two-phase designs involve measuring extra variables on a subset of the cohort where some variables are already measured. The goal of two-phase designs is to choose a subsample of individuals from the cohort and analyse that subsample efficiently. It is of interest to obtain an optimal design that gives the most efficient estimates of regression parameters. In this article, we propose a multiwave sampling design to approximate the optimal design for design-based estimators. Influence functions are used to compute the optimal sampling allocations. We propose to use informative priors on regression parameters to derive the wave-1 sampling probabilities because any prespecified sampling probabilities may be far from optimal and decrease the design efficiency. The posterior distributions of the regression parameters derived from the current wave will then be used as priors for the next wave. Generalized raking is used in the final statistical analysis. We show that a two-wave sampling with reasonable informative priors will end up with a highly efficient estimation for the parameter of interest and be close to the underlying optimal design.
两阶段设计涉及在已测量某些变量的队列子集中测量额外变量。两阶段设计的目标是从队列中选择一个个体子样本并有效地分析该子样本。获得能给出回归参数最有效估计的最优设计是很有意义的。在本文中,我们提出一种多波抽样设计来近似基于设计的估计量的最优设计。影响函数用于计算最优抽样分配。我们建议对回归参数使用信息先验来推导第一波抽样概率,因为任何预先指定的抽样概率可能远非最优且会降低设计效率。然后将从当前波得到的回归参数的后验分布用作下一波的先验。在最终的统计分析中使用广义拉科方法。我们表明,具有合理信息先验的两波抽样最终将对感兴趣的参数进行高效估计,并且接近潜在的最优设计。