Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
Stat Methods Med Res. 2020 Jan;29(1):111-121. doi: 10.1177/0962280218823231. Epub 2019 Jan 23.
Estimating the precision of a single proportion via a 100(1-α)% confidence interval in the presence of clustered data is an important statistical problem. It is necessary to account for possible over-dispersion, for instance, in animal-based teratology studies with within-litter correlation, epidemiological studies that involve clustered sampling, and clinical trial designs with multiple measurements per subject. Several asymptotic confidence interval methods have been developed, which have been found to have inadequate coverage of the true proportion for small-to-moderate sample sizes. In addition, many of the best-performing of these intervals have not been directly compared with regard to the operational characteristics of coverage probability and empirical length. This study uses Monte Carlo simulations to calculate coverage probabilities and empirical lengths of five existing confidence intervals for clustered data across various true correlations, true probabilities of interest, and sample sizes. In addition, we introduce a new score-based confidence interval method, which we find to have better coverage than existing intervals for small sample sizes under a wide range of scenarios.
在存在聚类数据的情况下,通过 100(1-α)%置信区间估计单个比例的精度是一个重要的统计问题。有必要考虑到可能的过分散性,例如,在具有胎内相关性的基于动物的致畸学研究中、涉及聚类抽样的流行病学研究中以及具有每个对象多个测量值的临床试验设计中。已经开发了几种渐近置信区间方法,这些方法已被发现对于小至中等样本量的真实比例的覆盖不足。此外,其中许多性能最佳的区间尚未直接针对覆盖概率和经验长度的操作特性进行比较。本研究使用蒙特卡罗模拟来计算五个现有聚类数据置信区间在不同真实相关性、真实感兴趣概率和样本量下的覆盖概率和经验长度。此外,我们引入了一种新的基于评分的置信区间方法,我们发现该方法在广泛的场景下对于小样本量具有比现有区间更好的覆盖。