[乳腺X线摄影诊断系统的性能评估：方法的演变及其在数字图像研究中的应用]

Compagnone G, Ferruzzi K, Pierotti L, Vianello Vos C, Berardi P, Bergamini C

Servizio di Fisica Sanitaria, Policlinico S. Orsola Malpighi, Azienda Ospedaliera, Bologna.

Radiol Med. 1999 Mar;97(3):179-87.

INTRODUCTION

"Receiver Operating Characteristic" (ROC) curves are one of the most efficient analysis tools for the complete evaluation of a diagnostic system performance. However this method is limited in visualizing and locating abnormal structures, such as clusters of microcalcifications on mammographic images. Other more refined and complex techniques have also been suggested, where particular statistical hypotheses are assumed, namely the "Free-response ROC" (FROC), the "Alternative FROC" (AFROC) and the "Free-response Forced Error" (FFE) analyses. We studied the theoretical bases of these different methods and their experimental applications to assess the correctness of the hypothetical statistical distributions.

MATERIAL AND METHODS

We considered two statistical hypotheses: first, that the false-positive response distribution follows the Poissonian statistics; second, that "signal" and "noise" distributions have a Gaussian trend with different means and variances. Thus, we applied the different methods to the responses given by 8 observers (5 radiologists and 3 medical physicists) who independently evaluated 3 digital mammographic samples. Every sample consisted of 39 images, with 1-15 clusters each (total: 100 clusters). The samples were obtained from 39 images available in an Internet database (sample 1); 2 different digital filters were applied to each image (samples 2 and 3). To collects responses, we provided for two phases: first, every observer visualized and located the clusters at a given confidence level; second, when a false-positive response was given, spontaneously or after forcing, the responses were ordered by decreasing conspicuity. Finally, data were analyzed with a "home-made" software by applying the FROC and AFROC analyses to the data collected in phase 1 and the FFE analysis to those collected in phase 2.

RESULTS

We considered the area under the AFROC curve as the most important parameter: the values obtained with the 3 types of analysis are well in agreement within their uncertainties. In particular, the FROC-AFROC agreement did not exceed 5.9% (10 of 14 cases within 2.5%), while the FFE analysis had higher standard deviations associated with the area value (about 10%). The interpolated curves from both FROC and AFROC data were very similar. The three methods had various advantages: the FFE is very simple to calculate and makes the most of the information given by the observer; FROC and AFROC can provide true-positive and false-positive responses on the same image, which permits to optimize the evaluation of a diagnostic system performance. The statistical tools used in the simplest methods are usually integrated with the completeness characteristics of the location of multiple signals on mammograms.

CONCLUSIONS

In theory, every method is necessary because it provides additional information to validate the statistical hypotheses under investigation. In fact, when the methods are used to evaluate and compare several diagnostic systems, the results of the three techniques are equivalent. Therefore, choosing a specific technique depends on both available resources and response type all the hypothetical statistical distributions in our study proved correct.

引言

“接收者操作特征”（ROC）曲线是全面评估诊断系统性能的最有效分析工具之一。然而，这种方法在可视化和定位异常结构方面存在局限性，例如乳腺钼靶图像上的微钙化簇。还提出了其他更精细和复杂的技术，这些技术假定了特定的统计假设，即“自由响应ROC”（FROC）、“替代FROC”（AFROC）和“自由响应强制误差”（FFE）分析。我们研究了这些不同方法的理论基础及其实验应用，以评估假设统计分布的正确性。

材料与方法

我们考虑了两个统计假设：第一，假阳性响应分布遵循泊松统计；第二，“信号”和“噪声”分布具有不同均值和方差的高斯趋势。因此，我们将不同方法应用于8名观察者（5名放射科医生和3名医学物理学家）给出的响应，这些观察者独立评估了3个数字乳腺钼靶样本。每个样本由39幅图像组成，每幅图像有1 - 15个簇（总共100个簇）。样本取自互联网数据库中的39幅图像（样本1）；对每幅图像应用2种不同的数字滤波器（样本2和样本3）。为了收集响应，我们设置了两个阶段：第一，每个观察者在给定置信水平下可视化并定位簇；第二，当给出假阳性响应时，无论是自发的还是强制后的，响应都按显著程度递减排序。最后，通过一个“自制”软件对数据进行分析，将FROC和AFROC分析应用于在第一阶段收集的数据，将FFE分析应用于在第二阶段收集的数据。

结果

我们将AFROC曲线下的面积视为最重要的参数：通过3种分析类型获得的值在其不确定性范围内非常一致。特别是，FROC - AFROC的一致性不超过5.9%（14个案例中有10个在2.5%以内），而FFE分析与面积值相关的标准差更高（约10%）。来自FROC和AFROC数据的插值曲线非常相似。这三种方法各有优点：FFE计算非常简单，并且充分利用了观察者给出的信息；FROC和AFROC可以在同一图像上提供真阳性和假阳性响应，这有助于优化对诊断系统性能的评估。最简单方法中使用的统计工具通常与乳腺钼靶上多个信号定位的完整性特征相结合。