临床环境中CAD系统性能的评估与比较。

Royal Institute of Technology, AlbaNova University Center, Department of Physics, SE--106 91 Stockholm, Sweden.

Acad Radiol. 2005 Jun;12(6):687-94. doi: 10.1016/j.acra.2005.02.005.

RATIONALE AND OBJECTIVES

Computer-aided detection (CAD) systems are frequently compared using free-response receiver operating characteristic (FROC) curves. While there are ample statistical methods for comparing FROC curves, when one is interested in comparing the outcomes of 2 CAD systems applied in a typical clinical setting, there is the additional matter of correctly determining the system operating point. This article shows how the effect of the sampling error on determining the correct CAD operating point can be captured. By incorporating this uncertainty, a method is presented that allows estimation of the probability with which a particular CAD system performs better than another on unseen data in a clinical setting.

MATERIALS AND METHODS

The distribution of possible clinical outcomes from 2 artificial CAD systems with different FROC curves is examined. The sampling error is captured by the distribution of possible system thresholds of the classifying machine that yields a specified sensitivity. After introducing a measure of superiority, the probability of one system being superior to the other can be determined.

RESULTS

It is shown that for 2 typical mammography CAD systems, each trained on independent representative datasets of 100 cases, the FROC curves must be separated by 0.20 false positives per image in order to conclude that there is a 90% probability that one is better than the other in a clinical setting. Also, there is no apparent gain in increasing the size of the training set beyond 100 cases.

DISCUSSION

CAD systems for mammography are modeled for illustrative purposes, but the method presented is applicable to any computer-aided detection system evaluated with FROC curves. The presented method is designed to construct confidence intervals around possible clinical outcomes and to assess the importance of training set size and separation between FROC curves of systems trained on different datasets.

原理与目的

计算机辅助检测（CAD）系统常通过自由响应接收器操作特性（FROC）曲线进行比较。虽然有大量统计方法可用于比较FROC曲线，但当人们想要比较在典型临床环境中应用的两种CAD系统的结果时，正确确定系统操作点则是另外一个问题。本文展示了如何捕捉采样误差对确定正确CAD操作点的影响。通过纳入这种不确定性，提出了一种方法，该方法能够估计在临床环境中特定CAD系统在未见过的数据上比另一个系统表现更好的概率。