Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA.
Br J Radiol. 2012 Sep;85(1017):1287-302. doi: 10.1259/bjr/45866310. Epub 2012 May 9.
Laboratory observer performance measurements, receiver operating characteristic (ROC) and free-response ROC (FROC) differ from actual clinical interpretations in several respects, which could compromise their clinical relevance. The objective of this study was to develop a method for quantifying the clinical relevance of a laboratory paradigm and apply it to compare the ROC and FROC paradigms in a nodule detection task.
The original prospective interpretations of 80 digital chest radiographs were classified by the truth panel as correct (C=1) or incorrect (C=0), depending on correlation with additional imaging, and the average of C was interpreted as the clinical figure of merit. FROC data were acquired for 21 radiologists and ROC data were inferred using the highest ratings. The areas under the ROC and alternative FROC curves were used as laboratory figures of merit. Bootstrap analysis was conducted to estimate conventional agreement measures between laboratory and clinical figures of merit. Also computed was a pseudovalue-based image-level correctness measure of the laboratory interpretations, whose association with C as measured by the area (rAUC) under an appropriately defined relevance ROC curve, is as a measure of the clinical relevance of a laboratory paradigm.
Low correlations (e.g. κ=0.244) and near chance level rAUC values (e.g. 0.598), attributable to differences between the clinical and laboratory paradigms, were observed. The absolute width of the confidence interval was 0.38 for the interparadigm differences of the conventional measures and 0.14 for the difference of the rAUCs.
The rAUC measure was consistent with the traditional measures but was more sensitive to the differences in clinical relevance. A new relevance ROC method for quantifying the clinical relevance of a laboratory paradigm is proposed.
实验室观察者性能测量、接收者操作特征(ROC)和自由响应 ROC(FROC)在几个方面与实际临床解释不同,这可能会影响其临床相关性。本研究的目的是开发一种量化实验室范式临床相关性的方法,并将其应用于比较结节检测任务中的 ROC 和 FROC 范式。
根据与额外成像的相关性,将 80 张数字胸部 X 射线的原始前瞻性解释由真理小组分类为正确(C=1)或不正确(C=0),并将 C 的平均值解释为临床优劣指标。为 21 名放射科医生获取 FROC 数据,并使用最高评分推断 ROC 数据。ROC 和替代 FROC 曲线下的面积被用作实验室优劣指标。进行了引导分析,以估计实验室和临床优劣指标之间的传统一致性度量。还计算了实验室解释的基于伪值的图像级正确性度量,其与 C 的关联(rAUC)作为实验室范式临床相关性的度量,是通过适当定义的相关性 ROC 曲线下的面积来衡量的。
观察到低相关性(例如,κ=0.244)和接近机会水平的 rAUC 值(例如,0.598),这归因于临床和实验室范式之间的差异。传统度量的跨范式差异的置信区间的绝对宽度为 0.38,rAUC 差异的置信区间的绝对宽度为 0.14。
rAUC 度量与传统度量一致,但对临床相关性的差异更敏感。提出了一种用于量化实验室范式临床相关性的新相关性 ROC 方法。