Kurt Rossmann Laboratories for Radiologic Image Research, Department of Radiology, The University of Chicago, 5841 South Maryland Avenue, MC 2026, Chicago, Illinois 60637, USA.
Med Phys. 2007 Jul;34(7):2890-5. doi: 10.1118/1.2745937.
The presentation of images that are similar to that of an unknown lesion seen on a mammogram may be helpful for radiologists to correctly diagnose that lesion. For similar images to be useful, they must be quite similar from the radiologists' point of view. We have been trying to quantify the radiologists' impression of similarity for pairs of lesions and to establish a "gold standard" for development and evaluation of a computerized scheme for selecting such similar images. However, it is considered difficult to reliably and accurately determine similarity ratings, because they are subjective. In this study, we compared the subjective similarities obtained by two different methods, an absolute rating method and a 2-alternative forced-choice (2AFC) method, to demonstrate that reliable similarity ratings can be determined by the responses of a group of radiologists. The absolute similarity ratings were previously obtained for pairs of masses and pairs of microcalcifications from five and nine radiologists, respectively. In this study, similarity ranking scores for eight pairs of masses and eight pairs of microcalcifications were determined by use of the 2AFC method. In the first session, the eight pairs of masses and eight pairs of microcalcifications were grouped and compared separately for determining the similarity ranking scores. In the second session, another similarity ranking score was determined by use of mixed pairs, i.e., by comparison of the similarity of a mass pair with that of a calcification pair. Four pairs of masses and four pairs of microcalcifications were grouped together to create two sets of eight pairs. The average absolute similarity ratings and the average similarity ranking scores showed very good correlations in the first study (Pearson's correlation coefficients: 0.94 and 0.98 for masses and microcalcifications, respectively). Moreover, in the second study, the correlations between the absolute ratings and the ranking scores were also very high (0.92 and 0.96), which implies that the observers were able to compare the similarity of a mass pair with that of a calcification pair consistently. These results provide evidence that the concept of similarity for pairs of images is robust, even across different lesion types, and that radiologists are able to reliably determine subjective similarity for pairs of breast lesions.
图像的呈现方式与乳腺 X 光片中所见的未知病变相似,这可能有助于放射科医生正确诊断该病变。为了使相似的图像有用,从放射科医生的角度来看,它们必须非常相似。我们一直在尝试量化放射科医生对病变对之间相似性的印象,并为开发和评估用于选择此类相似图像的计算机化方案建立“金标准”。然而,由于相似性评价是主观的,因此认为很难可靠和准确地确定相似性评价。在这项研究中,我们比较了两种不同方法(绝对评分法和 2 项选择强迫选择法)获得的主观相似性,以证明通过一组放射科医生的反应可以确定可靠的相似性评分。绝对相似评分先前是针对分别来自五名和九名放射科医生的肿块对和微钙化对获得的。在这项研究中,使用 2 项选择强迫选择法确定了八对肿块和八对微钙化的相似性排序评分。在第一阶段,将八对肿块和八对微钙化分别分组进行比较,以确定相似性排序评分。在第二阶段,使用混合对(即通过比较肿块对与钙化对的相似性)确定另一个相似性排序评分。将四对肿块和四对微钙化分组在一起,创建了两组八对。在第一项研究中,平均绝对相似评分和平均相似性排序评分显示出非常好的相关性(分别为肿块和微钙化的 Pearson 相关系数:0.94 和 0.98)。此外,在第二项研究中,绝对评分与排序评分之间的相关性也非常高(0.92 和 0.96),这意味着观察者能够一致地比较肿块对与钙化对的相似性。这些结果表明,即使在不同的病变类型之间,对图像对之间相似性的概念也是稳健的,并且放射科医生能够可靠地确定对乳房病变对之间的主观相似性。