Song H H
Department of Biostatistics, Catholic University Medical College, Seoul, Korea.
Biometrics. 1997 Mar;53(1):370-82.
This paper focuses on methods of analysis of areas under receiver operating characteristic (ROC) curves. Analysis of ROC areas should incorporate the correlation structure of repeated measurements taken on the same set of cases and the paucity of measurements per treatment resulting from an effective summarization of cases into a few area measures of diagnostic accuracy. The repeated nature of ROC data has been taken into consideration in the analysis methods previously suggested by Swets and Pickett (1982, Evaluation of Diagnostic Systems: Methods from Signal Detection Theory), Hanley and McNeil (1983, Radiology 148, 839-843), and DeLong, DeLong, and Clarke-Pearson (1988, Biometrics 44, 837-845). DeLong et al.'s procedure is extended to a Wald test for general situations of diagnostic testing. The method of analyzing jackknife pseudovalues by treating them as data is extremely useful when the number of area measures to be tested is quite small. The Wald test based on covariances of multivariate multisample U-statistics is compared with two approaches of analyzing pseudovalues, the univariate mixed-model analysis of variance (ANOVA) for repeated measurements and the three-way factorial ANOVA. Monte Carlo simulations demonstrate that the three tests give good approximation to the nominal size at the 5% levels for large sample sizes, but the paired t-test using ROC areas as data lacks the power of the other three tests and Hanley and McNeil's method is inappropriate for testing diagnostic accuracies. The Wald statistic performs better than the ANOVAs of pseudovalues. Jackknifing schemes of multiple deletion where different structures of normal and diseased distributions are accounted for appear to perform slightly better than simple multiple-deletion schemes but no appreciable power difference is apparent, and deletion of too many cases at a time may sacrifice power. These methods have important applications in diagnostic testing in ROC studies of radiology and of medicine in general.
本文聚焦于接受者操作特征(ROC)曲线下面积的分析方法。ROC曲线下面积的分析应纳入对同一组病例进行重复测量时的相关结构,以及由于将病例有效汇总为几个诊断准确性的面积测量值而导致的每种治疗测量值的稀缺性。Swets和Pickett(1982年,《诊断系统评估:信号检测理论方法》)、Hanley和McNeil(1983年,《放射学》148卷,839 - 843页)以及DeLong、DeLong和Clarke - Pearson(1988年,《生物统计学》44卷,837 - 845页)先前提出的分析方法已考虑到ROC数据的重复性质。DeLong等人的程序被扩展为用于诊断测试一般情况的Wald检验。当要测试的面积测量值数量非常少时,将刀切伪值视为数据进行分析的方法极其有用。将基于多元多样本U统计量协方差的Wald检验与两种分析伪值的方法进行比较,即重复测量的单变量混合模型方差分析(ANOVA)和三因素方差分析。蒙特卡罗模拟表明,对于大样本量,这三种检验在5%水平下对名义大小的近似效果良好,但将ROC面积作为数据的配对t检验缺乏其他三种检验的功效,并且Hanley和McNeil的方法不适用于测试诊断准确性。Wald统计量的表现优于伪值的方差分析。考虑到正常和患病分布不同结构的多重删除刀切法似乎比简单的多重删除法表现稍好,但没有明显的功效差异,并且一次删除过多病例可能会牺牲功效。这些方法在放射学和一般医学的ROC研究中的诊断测试中具有重要应用。