Blume Jeffrey D
Center for Statistical Sciences, Brown University, Providence RI 02912, Email at
J Stat Plan Inference. 2009 Mar 1;139(1):711-721. doi: 10.1016/j.jspi.2007.09.015.
Studies of diagnostic tests are often designed with the goal of estimating the area under the receiver operating characteristic curve (AUC) because the AUC is a natural summary of a test's overall diagnostic ability. However, sample size projections dealing with AUCs are very sensitive to assumptions about the variance of the empirical AUC estimator, which dependens on two correlation parameters. While these correlation parameters can be estimated from available data, in practice it is hard to find reliable estimates before the study is conducted. Here we derive achievable bounds on the projected sample size that are free of these two correlation parameters. The lower bound is the smallest sample size that would yield the desired level of precision for some model, while the upper bound is the smallest sample size that would yield the desired level of precision for all models. These bounds are important reference points when designing a single or multi-arm study; they are the absolute minimum and maximum sample size that would ever be required. When the study design includes multiple readers or interpreters of the test, we derive bounds pertaining to the average reader AUC and the 'pooled' or overall AUC for the population of readers. These upper bounds for multireader studies are not too conservative when several readers are involved.
诊断试验的研究通常旨在估计受试者工作特征曲线(AUC)下的面积,因为AUC是对一项试验总体诊断能力的自然概括。然而,处理AUC的样本量预测对经验AUC估计值方差的假设非常敏感,而经验AUC估计值方差取决于两个相关参数。虽然这些相关参数可以从现有数据中估计出来,但在实践中,在研究进行之前很难找到可靠的估计值。在此,我们推导出了与这两个相关参数无关的预测样本量的可达到界限。下限是对于某些模型能产生所需精度水平的最小样本量,而上限是对于所有模型能产生所需精度水平的最小样本量。在设计单臂或多臂研究时,这些界限是重要的参考点;它们是所需的绝对最小和最大样本量。当研究设计包括多个测试的读取者或解释者时,我们推导出了与平均读取者AUC以及读取者群体的“合并”或总体AUC相关的界限。当涉及多个读取者时,多读取者研究的这些上限不会过于保守。