Gur D, Rubin D A, Kart B H, Peterson A M, Fuhrman C R, Rockette H E, King J L
Department of Radiology, University of Pittsburgh, PA, USA.
J Digit Imaging. 1997 Aug;10(3):103-7. doi: 10.1007/BF03168596.
This study compared a five-category ordinal scale and a two-alternative forced-choice subjective rating of image quality preferences in a multiabnormality environment. 140 pairs of laser-printed posteroanterior (PA) chest images were evaluated twice by three radiologists who were asked to select during a side-by-side review which image in each pair was the "better" one for the determination of the presence or absence of specific abnormalities. Each pair included one image (the digitized film at 100 microns pixel resolution and laser printed onto film) and a highly compressed (approximately 60:1) and decompressed version of the digitized film that was laser printed onto film. Ratings were performed once with a five-category ordinal scale and once with a two-alternative forced-choice scale. The selection process was significantly affected by the rating scale used. The "comparable" or "equivalent for diagnosis "category was used in 88.5% of the ratings with the ordinal scale. When using the two-alternative forced-choice approach, noncompressed images were selected 66.8% of the time as being the "better" images. This resulted in a significantly lower ability to detect small differences in perceived image quality between the noncompressed and compressed images when the ordinal rating scale is used. Observer behavior can be affected by the type of question asked and the rating scale used. Observers are highly sensitive to small differences in image presentation during a side-by-side review.
本研究在多异常环境中比较了五级顺序量表和二项迫选主观图像质量偏好评分。140对激光打印的后前位(PA)胸部图像由三位放射科医生进行了两次评估,要求他们在并排查看时选择每对图像中哪一幅“更好”,以确定特定异常的有无。每对图像包括一幅图像(100微米像素分辨率的数字化胶片并激光打印在胶片上)以及该数字化胶片的高度压缩(约60:1)和解压缩版本并激光打印在胶片上。评分一次采用五级顺序量表,一次采用二项迫选量表。选择过程受到所使用评分量表的显著影响。在顺序量表评分中,88.5%的评分使用了“可比”或“诊断等效”类别。当使用二项迫选方法时,66.8% 的情况下选择未压缩图像为“更好”的图像。这导致在使用顺序评分量表时,检测未压缩图像和压缩图像之间感知图像质量的微小差异的能力显著降低。观察者行为会受到所提问题类型和所使用评分量表的影响。在并排查看期间,观察者对图像呈现的微小差异高度敏感。