Waller J L, Addy C L, Jackson K L, Garrison C Z
Department of Epidemiology and Biostatistics, University of South Carolina, Columbia 29208.
Stat Med. 1994 May 30;13(10):1071-82. doi: 10.1002/sim.4780131009.
We investigate methods for the construction of confidence intervals for a proportion in a stratified two-stage sampling design with few events occurring in a small number of large, unequal size strata. The critical aspect is the incorporation of the weighting scheme into the construction of a single overall confidence interval. With small numbers of events, the binomial based methods may be inadequate since the normal approximation is not valid. Computer simulations compare coverage probability and bias for five methods of obtaining confidence intervals for proportions by combining: (1) binomial variances; (2) confidence intervals based on the F-distribution approximation to the cumulative binomial; (3) the binomial variance method with exact confidence limits when a zero prevalence occurs in any stratum; (4) confidence intervals based on the F-distribution using a rescaling factor; and (5) the binomial variance method with exact confidence limits using a rescaling factor. The method that performs best in terms of coverage probability is the combination of stratum specific confidence intervals based on the F-distribution using a rescaling factor. The methods involving the binomial variance tend to be negatively biased and the methods based on the F-distribution tend to be positively biased. Application of these methods with data from a study of adolescent depression that employs a stratified two-stage sampling design is consistent with these results.
我们研究了在分层两阶段抽样设计中为比例构建置信区间的方法,该设计中在少数几个规模大且不等的分层中发生的事件较少。关键在于将加权方案纳入单个总体置信区间的构建中。由于事件数量较少,基于二项式的方法可能并不充分,因为正态近似无效。计算机模拟比较了通过组合以下五种方法获得比例置信区间时的覆盖概率和偏差:(1) 二项式方差;(2) 基于F分布近似累积二项式的置信区间;(3) 当任何分层中患病率为零时具有精确置信限的二项式方差方法;(4) 使用重缩放因子基于F分布的置信区间;(5) 使用重缩放因子具有精确置信限的二项式方差方法。在覆盖概率方面表现最佳的方法是使用重缩放因子基于F分布的特定分层置信区间的组合。涉及二项式方差的方法往往存在负偏差,而基于F分布的方法往往存在正偏差。将这些方法应用于一项采用分层两阶段抽样设计的青少年抑郁症研究的数据,结果与此一致。