Swets J A
BBN Laboratories Incorporated, Cambridge, MA 02238.
Science. 1988 Jun 3;240(4857):1285-93. doi: 10.1126/science.3287615.
Diagnostic systems of several kinds are used to distinguish between two classes of events, essentially "signals" and "noise". For them, analysis in terms of the "relative operating characteristic" of signal detection theory provides a precise and valid measure of diagnostic accuracy. It is the only measure available that is uninfluenced by decision biases and prior probabilities, and it places the performances of diverse systems on a common, easily interpreted scale. Representative values of this measure are reported here for systems in medical imaging, materials testing, weather forecasting, information retrieval, polygraph lie detection, and aptitude testing. Though the measure itself is sound, the values obtained from tests of diagnostic systems often require qualification because the test data on which they are based are of unsure quality. A common set of problems in testing is faced in all fields. How well these problems are handled, or can be handled in a given field, determines the degree of confidence that can be placed in a measured value of accuracy. Some fields fare much better than others.
有几种诊断系统被用于区分两类事件,本质上是“信号”和“噪声”。对于这些系统,基于信号检测理论的“相对操作特性”进行分析,能够提供一种精确且有效的诊断准确性度量。它是唯一不受决策偏差和先验概率影响的可用度量,并且能将不同系统的性能置于一个通用且易于解释的尺度上。本文报告了医学成像、材料测试、天气预报、信息检索、测谎仪测谎和能力测试等系统中该度量的代表性值。尽管该度量本身是合理的,但从诊断系统测试中获得的值往往需要限定条件,因为它们所基于的测试数据质量不确定。所有领域在测试中都面临一组常见问题。在给定领域中,这些问题处理得如何,或者能够处理得如何,决定了对测量得到的准确性值可信赖的程度。有些领域比其他领域表现得好得多。