Lee Juhun, Nishikawa Robert M, Reiser Ingrid, Zuley Margarita L, Boone John M
University of Pittsburgh, Department of Radiology, Pittsburgh, Pennsylvania, United States.
The University of Chicago, Department of Radiology, Chicago, Illinois, United States.
J Med Imaging (Bellingham). 2017 Apr;4(2):025502. doi: 10.1117/1.JMI.4.2.025502. Epub 2017 May 3.
We tested the agreement of radiologists' rankings of different reconstructions of breast computed tomography images based on their diagnostic (classification) performance and on their subjective image quality assessments. We used 102 pathology proven cases (62 malignant, 40 benign), and an iterative image reconstruction (IIR) algorithm to obtain 24 reconstructions per case with different image appearances. Using image feature analysis, we selected 3 IIRs and 1 clinical reconstruction and 50 lesions. The reconstructions produced a range of image quality from smooth/low-noise to sharp/high-noise, which had a range in classifier performance corresponding to AUCs of 0.62 to 0.96. Six experienced Mammography Quality Standards Act (MQSA) radiologists rated the likelihood of malignancy for each lesion. We conducted an additional reader study with the same radiologists and a subset of 30 lesions. Radiologists ranked each reconstruction according to their preference. There was disagreement among the six radiologists on which reconstruction produced images with the highest diagnostic content, but they preferred the midsharp/noise image appearance over the others. However, the reconstruction they preferred most did not match with their performance. Due to these disagreements, it may be difficult to develop a single image-based model observer that is representative of a population of radiologists for this particular imaging task.
我们基于诊断(分类)性能和主观图像质量评估,测试了放射科医生对乳腺计算机断层扫描图像不同重建方式的排序一致性。我们使用了102例经病理证实的病例(62例恶性,40例良性),并采用迭代图像重建(IIR)算法,为每个病例获取24种具有不同图像外观的重建结果。通过图像特征分析,我们选择了3种IIR重建、1种临床重建以及50个病灶。这些重建结果产生了一系列图像质量,从平滑/低噪声到清晰/高噪声,其分类器性能范围对应的曲线下面积(AUC)为0.62至0.96。六位经验丰富的《乳腺摄影质量标准法案》(MQSA)放射科医生对每个病灶的恶性可能性进行了评分。我们对相同的放射科医生和30个病灶的子集进行了另一项阅片者研究。放射科医生根据自己的偏好对每种重建进行排序。六位放射科医生对于哪种重建产生的图像具有最高诊断价值存在分歧,但他们更倾向于中等清晰/噪声的图像外观。然而,他们最喜欢的重建与他们的表现并不匹配。由于这些分歧,可能难以开发出一个基于图像的单一模型观察者,来代表放射科医生群体完成这项特定的成像任务。