Cao Jing, Lee J Jack, Alber Susan
Department of Statistical Science, Southern Methodist University, Dallas, Texas, 75275.
Department of Biostatistics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas, 77030.
J Stat Plan Inference. 2009 Dec 1;139(12):4111-4122. doi: 10.1016/j.jspi.2009.05.041.
A challenge for implementing performance based Bayesian sample size determination is selecting which of several methods to use. We compare three Bayesian sample size criteria: the average coverage criterion (ACC) which controls the coverage rate of fixed length credible intervals over the predictive distribution of the data, the average length criterion (ALC) which controls the length of credible intervals with a fixed coverage rate, and the worst outcome criterion (WOC) which ensures the desired coverage rate and interval length over all (or a subset of) possible datasets. For most models, the WOC produces the largest sample size among the three criteria, and sample sizes obtained by the ACC and the ALC are not the same. For Bayesian sample size determination for normal means and differences between normal means, we investigate, for the first time, the direction and magnitude of differences between the ACC and ALC sample sizes. For fixed hyperparameter values, we show that the difference of the ACC and ALC sample size depends on the nominal coverage, and not on the nominal interval length. There exists a threshold value of the nominal coverage level such that below the threshold the ALC sample size is larger than the ACC sample size, and above the threshold the ACC sample size is larger. Furthermore, the ACC sample size is more sensitive to changes in the nominal coverage. We also show that for fixed hyperparameter values, there exists an asymptotic constant ratio between the WOC sample size and the ALC (ACC) sample size. Simulation studies are conducted to show that similar relationships among the ACC, ALC, and WOC may hold for estimating binomial proportions. We provide a heuristic argument that the results can be generalized to a larger class of models.
实施基于性能的贝叶斯样本量确定面临的一个挑战是选择使用几种方法中的哪一种。我们比较了三种贝叶斯样本量标准:平均覆盖率标准(ACC),它控制固定长度可信区间在数据预测分布上的覆盖率;平均长度标准(ALC),它控制具有固定覆盖率的可信区间的长度;以及最坏结果标准(WOC),它确保在所有(或部分)可能的数据集中达到所需的覆盖率和区间长度。对于大多数模型,WOC在这三个标准中产生的样本量最大,并且通过ACC和ALC获得的样本量不相同。对于正态均值和正态均值之间差异的贝叶斯样本量确定,我们首次研究了ACC和ALC样本量之间差异的方向和大小。对于固定的超参数值,我们表明ACC和ALC样本量的差异取决于名义覆盖率,而不取决于名义区间长度。存在一个名义覆盖率水平的阈值,使得低于该阈值时ALC样本量大于ACC样本量,高于该阈值时ACC样本量更大。此外,ACC样本量对名义覆盖率的变化更敏感。我们还表明,对于固定的超参数值,WOC样本量与ALC(ACC)样本量之间存在渐近常数比。进行了模拟研究以表明在估计二项比例时,ACC、ALC和WOC之间可能存在类似的关系。我们提供了一个启发式的论点,即结果可以推广到更大类的模型。