White D B, James L
Department of Mathematics, University of Toledo, Ohio 43606-3390, USA.
J Clin Epidemiol. 1996 Apr;49(4):419-29. doi: 10.1016/0895-4356(95)00570-6.
A method of sample size determination for estimation of probabilities based on a test variable is presented. Applications to estimation of sensitivity and specificity of medical tests are the focus of this research, although the methods can be applied to other areas of study such as engineering reliability. Examples are given for determining sample sizes required for the classification of patients with cutaneous lupus erythematosus based on the incidence of several markers. In this example, the test variable is the number of markers present. The methodology employs a weighted average of model-based and non-model-based estimates of the probability with the weights determined by the closeness to or the confidence in the given model. Formulas and charts required for determining sample size are provided for test variables that can be modeled by the binomial, Poisson, or normal distributions, i.e., for the most commonly encountered distributions for counting events (binomial and Poisson) and for measurements (normal). However, the methods given can be applied to any distribution, including multivariate. Especially when relatively small probabilities (the rare events) are being estimated, the techniques provided assistance in safeguarding against undersampling brought on by unwarranted confidence in a test variable distribution and against oversampling required for high accuracy in non-model-based probability estimators.
本文提出了一种基于测试变量估计概率时确定样本量的方法。本研究的重点是该方法在医学检验敏感性和特异性估计中的应用,不过这些方法也可应用于工程可靠性等其他研究领域。文中给出了基于几种标志物的发病率对皮肤红斑狼疮患者进行分类所需样本量的确定示例。在此示例中,测试变量是存在的标志物数量。该方法采用基于模型和非基于模型的概率估计的加权平均值,权重由与给定模型的接近程度或对其的置信度确定。对于可由二项分布、泊松分布或正态分布建模的测试变量,即对于计数事件(二项分布和泊松分布)和测量(正态分布)中最常遇到的分布,提供了确定样本量所需的公式和图表。然而,所给出的方法可应用于任何分布,包括多元分布。特别是在估计相对较小的概率(罕见事件)时,这些技术有助于防止因对测试变量分布过度自信而导致的抽样不足,以及防止非基于模型的概率估计器为实现高精度而进行的过度抽样。