置信区间的经验覆盖范围：置信水平的点估计和置信区间

The empirical coverage of confidence intervals: point estimates and confidence intervals for confidence levels.

作者信息

Schall Robert

机构信息

Department of Mathematical Statistics and Actuarial Science, University of the Free State, P.O. Box 339 (IB75), Bloemfontein, South Africa.

出版信息

Biom J. 2012 Jul;54(4):537-51. doi: 10.1002/bimj.201100134. Epub 2012 May 23.

DOI:10.1002/bimj.201100134

PMID:22623325

Abstract

Many confidence intervals calculated in practice are potentially not exact, either because the requirements for the interval estimator to be exact are known to be violated, or because the (exact) distribution of the data is unknown. If a confidence interval is approximate, the crucial question is how well its true coverage probability approximates its intended coverage probability. In this paper we propose to use the bootstrap to calculate an empirical estimate for the (true) coverage probability of a confidence interval. In the first instance, the empirical coverage can be used to assess whether a given type of confidence interval is adequate for the data at hand. More generally, when planning the statistical analysis of future trials based on existing data pools, the empirical coverage can be used to study the coverage properties of confidence intervals as a function of type of data, sample size, and analysis scale, and thus inform the statistical analysis plan for the future trial. In this sense, the paper proposes an alternative to the problematic pretest of the data for normality, followed by selection of the analysis method based on the results of the pretest. We apply the methodology to a data pool of bioequivalence studies, and in the selection of covariance patterns for repeated measures data.

摘要

在实际中计算的许多置信区间可能并不精确，这要么是因为已知区间估计量精确的要求被违反，要么是因为数据的（精确）分布未知。如果一个置信区间是近似的，关键问题在于其真实覆盖概率能多好地逼近其预期覆盖概率。在本文中，我们提议使用自助法来计算置信区间（真实）覆盖概率的经验估计值。首先，经验覆盖可用于评估给定类型的置信区间对于手头数据是否合适。更一般地，当基于现有数据池规划未来试验的统计分析时，经验覆盖可用于研究置信区间的覆盖特性如何随数据类型、样本量和分析规模而变化，从而为未来试验的统计分析计划提供参考。从这个意义上说，本文提出了一种替代方法，以取代对数据进行正态性的有问题的预检验，然后根据预检验结果选择分析方法。我们将该方法应用于生物等效性研究的数据池以及重复测量数据协方差模式的选择。