Swensson R G
Department of Radiology, Harvard Medical School, Boston, Massachusetts, USA.
Med Phys. 1996 Oct;23(10):1709-25. doi: 10.1118/1.597758.
In this paper methods used to measure observer performance are reviewed, and a simple general model for finding and reporting target objects in gray-scale image backgrounds is presented. That model provides the basis for a combined measurement of detection and localization performance in various image-interpretation tasks, whether by human observers or by realized computer algorithms. The model assumes that (1) an observer's detection response and first choice of target location both depend on the "maximally suspicious" finding on an image, (2) a correct (first-choice) localization of the actual target occurs if and only if its location is selected as the most suspicious, and (3) a target's presence does not alter the degree of suspicion engendered by any other (normal) image findings. Formalization of these assumptions relates the ROC curve, which measures the ability to discriminate between images containing targets and images without targets, to the "Localization Response" (LROC) curve, which measures the conjoint ability to detect and correctly localize the actual targets in those images. A maximum-likelihood statistical procedure, developed for a two-parameter "binormal" version of this model, concurrently fits both the ROC and LROC curves from an observer's image ratings and target localizations for a set of image interpretations. The model's application is illustrated (and compared to standard ROC analysis) using sets of rating and localization data from radiologists asked to search chest films for pulmonary nodules. This model is then extended to multiple-report ("free-response") interpretations of multiple-target images, under the stringent requirement that an observer's detection capability and criterion for reporting possible targets both remain stationary across images and across the successive reports made on a given image. That extended model yields formulations and predictions for the so-called "Free-Response" (FROC) curve, and for a recently proposed "Alternative FROC" (AFROC) curve. Tests of that model's "stationarity" assumptions are illustrated using radiologists' free-search interpretations of chest films for pulmonary nodules, and they suggest that human observers may often violate those assumptions when making multiple-report interpretations of images.
本文回顾了用于测量观察者表现的方法,并提出了一个在灰度图像背景中查找和报告目标物体的简单通用模型。该模型为在各种图像解释任务中综合测量检测和定位表现提供了基础,无论是由人类观察者还是已实现的计算机算法来执行这些任务。该模型假设:(1)观察者的检测响应和对目标位置的首次选择均取决于图像上“最可疑”的发现;(2)当且仅当实际目标的位置被选为最可疑位置时,才会出现对实际目标的正确(首次选择)定位;(3)目标的存在不会改变任何其他(正常)图像发现所产生的可疑程度。这些假设的形式化将用于衡量区分包含目标的图像和不包含目标的图像的能力的ROC曲线,与用于衡量在那些图像中检测并正确定位实际目标的联合能力的“定位响应”(LROC)曲线联系起来。为该模型的双参数“双正态”版本开发的最大似然统计程序,可根据观察者对一组图像解释的图像评分和目标定位,同时拟合ROC曲线和LROC曲线。使用放射科医生在胸部X光片中搜索肺结节的评分和定位数据集,说明了该模型的应用(并与标准ROC分析进行了比较)。然后,在严格要求观察者的检测能力和报告可能目标的标准在不同图像以及对给定图像所做的连续报告中均保持不变的情况下,将该模型扩展到多目标图像的多报告(“自由响应”)解释。该扩展模型得出了所谓“自由响应”(FROC)曲线以及最近提出的“替代FROC”(AFROC)曲线的公式和预测。使用放射科医生对胸部X光片中肺结节的自由搜索解释来说明对该模型“平稳性”假设的测试,结果表明人类观察者在对图像进行多报告解释时可能经常违反这些假设。