Rockette H E, King J L, Medina J L, Eisen H B, Brown M L, Gur D
Department of Biostatistics, University of Pittsburgh, PA 15261, USA.
AJR Am J Roentgenol. 1995 Sep;165(3):679-83. doi: 10.2214/ajr.165.3.7645495.
Large-scale receiver operating characteristic (ROC) studies are expensive and time-consuming. If most of the difference in diagnostic accuracy occurs in a subset of subtle cases, considerable effort could be saved by restricting comparisons to this subset. We investigate the effect of subtle cases on diagnostic accuracy, the magnitude of error that can occur because of an imbalance of subtle cases in two groups, and the potential for sample size reductions if only subtle cases are used.
Data from a previous study of posteroanterior chest radiographs were reanalyzed separately for subsets of typical cases and subsets of subtle cases. Actually positive and actually negative cases were classified as subtle or typical and as difficult or easy for diagnosis of the specific abnormality. The area under the ROC curve (Az) was used as the measure of diagnostic accuracy. Pairwise comparisons were done among three techniques and for the detection of nodules and interstitial disease.
The performance index (Az) was significantly (> or = 25%) lower for the subset of subtle cases as compared with the subset of typical cases. The difference in observer performance between two techniques was more often greater in the subset of subtle cases than in the subset of typical cases.
The difference in diagnostic accuracy between the subset of typical cases and the subset of subtle cases is large enough that a difference in the proportion of subtle cases in two samples could result in clinically significant false differences in observer performance. Furthermore, the generally larger difference observed in the group of subtle cases suggests that sample sizes for some experiments could be reduced by 45-90% if the experiment were restricted to subtle cases.
大规模的受试者工作特征(ROC)研究成本高昂且耗时。如果诊断准确性的大部分差异出现在一组细微病例中,那么将比较限制在该子集上可节省大量精力。我们研究了细微病例对诊断准确性的影响、两组细微病例不平衡可能导致的误差大小,以及仅使用细微病例时样本量减少的可能性。
对先前一项关于后前位胸部X光片研究的数据,分别针对典型病例子集和细微病例子集进行重新分析。实际阳性和实际阴性病例被分类为细微或典型,以及对于特定异常诊断而言困难或容易。ROC曲线下面积(Az)用作诊断准确性的度量。对三种技术之间以及结节和间质性疾病检测进行两两比较。
与典型病例子集相比,细微病例子集的性能指标(Az)显著(≥25%)更低。两种技术之间观察者表现的差异在细微病例子集中比在典型病例子集中更常更大。
典型病例子集和细微病例子集之间的诊断准确性差异足够大,以至于两个样本中细微病例比例的差异可能导致观察者表现上临床上显著的假差异。此外,在细微病例组中观察到的通常更大的差异表明,如果实验仅限于细微病例,某些实验的样本量可减少45% - 90%。