Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Semin Nucl Med. 2011 Nov;41(6):419-36. doi: 10.1053/j.semnuclmed.2011.06.005.
Most academic radiologists will be familiar with receiver operating characteristic (ROC) studies. Fundamental studies of human observer performance are now usually performed by forced-choice methods. Both methods are based on signal detection theory. The ROC method gives an operating curve of true-positive versus false-positive probabilities. The area under the curve, A(Z), can be used a summary performance measure. In the forced-choice method, observers are given 2 or more images with one containing the signal. The observer's task is to select the option most likely to contain the signal. The percentage of correct responses, PC, is a summary performance measure. Precise comparison of the 2 methods is limited to very controlled experiments in which signals (simulated lesions for example) are carefully designed and detection or discrimination is limited by true random noise. Under these conditions, theory predicts a simple relationship between summary measures and human results are consistent with theory. There will be a description of forced-choice experimental methods and data analysis. There has also been considerable work on development of theoretic observer models. Human experiment results have used to evaluate the models. Models that correlate well with human performance in turn can be used for preliminary design of new imaging systems and for selection of image quality metrics for comparing equipment performance, this article will provide a summary of work during the last 30 years on evaluating human signal detection capabilities, observer models and image quality metrics.
大多数医学影像学专家都熟悉接受者操作特性(ROC)研究。现在,通常采用强制选择方法对人类观察者的性能进行基础研究。这两种方法都基于信号检测理论。ROC 方法提供了真阳性与假阳性概率的工作曲线。曲线下面积 A(Z)可用作性能综合衡量指标。在强制选择方法中,观察者会看到 2 个或更多图像,其中一个包含信号。观察者的任务是选择最有可能包含信号的选项。正确响应的百分比 PC 是一个综合性能衡量指标。只有在非常受控的实验中,两种方法才能进行精确比较,在这些实验中,信号(例如模拟病变)经过精心设计,检测或区分仅受真实随机噪声的限制。在这些条件下,理论预测了综合衡量指标与人类结果之间的简单关系,并且人类结果与理论一致。本文将介绍强制选择实验方法和数据分析。此外,还开展了大量工作来开发理论观察器模型。人类实验结果用于评估模型。与人类性能密切相关的模型可用于新成像系统的初步设计,以及用于比较设备性能的图像质量指标的选择,本文将对过去 30 年来评估人类信号检测能力、观察器模型和图像质量指标的工作进行总结。