Wenzel A, Hintze H
Department of Oral Radiology, Royal Dental College, Faculty of Health Sciences, University of Aarhus, Denmark.
Dentomaxillofac Radiol. 1999 May;28(3):182-5. doi: 10.1038/sj/dmfr/4600438.
To compare the effect of the choice of gold standard on the diagnostic outcome of approximal caries detection in original and compressed digital radiographs.
116 extracted teeth radiographed with a storage phosphor system constituted the original images. These images were compressed at 1:20 and 1:33 with the JPEG irreversible compression standard. Five radiologists scored the three sets of images for the presence of approximal caries on a five-rank confidence scale. The radiographic scores were validated by stereomicroscopy (the true gold standard). The individual ROC areas for the five observers were used to select the worst (obsworst) and the best (obsbest) performer: Their scores in the original images were used as the second and third 'gold standards' for the remaining observers. Mean ROC areas for the three observers with the three types of images were calculated using these two new 'gold standards'. Differences between the ROC areas when using microscopy, obsworst, and obsbest as the 'gold standard' were compared.
The mean ROC areas in the original images were 0.66, 0.74 and 0.91 using the true gold standard and obsbest and obsworst as the 'gold standards' respectively. The difference between the true gold standard and obsworst was statistically significant (P < 0.001). The mean ROC areas using the true gold standard decreased with increasing compression whereas they were constant or increased using obsworst and obsbest as 'gold standards', respectively.
Accuracy in approximal caries diagnosis was significantly higher when an observer was the 'gold standard' than when the true gold standard was obtained by microscopy. Paradoxically, the compressed, degraded images were more accurate than the originals when an observer was the 'gold standard' while they were less accurate with the true gold standard. Thus, results obtained using observers' scores from the radiographs which are being evaluated, as validation for the presence of caries may mislead the clinician.
比较金标准的选择对原始数字X线片和压缩数字X线片中邻面龋检测诊断结果的影响。
用存储磷光体系统拍摄的116颗离体牙构成原始图像。这些图像采用JPEG不可逆压缩标准分别以1:20和1:33的比例进行压缩。五名放射科医生以五级置信度对三组图像中邻面龋的存在情况进行评分。通过体视显微镜(真正的金标准)对X线片评分进行验证。使用五名观察者的个体ROC曲线面积来选择表现最差(obsworst)和最佳(obsbest)的观察者:他们在原始图像中的评分被用作其余观察者的第二和第三个“金标准”。使用这两个新的“金标准”计算三名观察者对三种类型图像的平均ROC曲线面积。比较以显微镜、obsworst和obsbest作为“金标准”时ROC曲线面积之间的差异。
在原始图像中,分别以真正的金标准、obsbest和obsworst作为“金标准”时,平均ROC曲线面积分别为0.66、0.74和0.91。真正的金标准与obsworst之间的差异具有统计学意义(P < 0.001)。使用真正的金标准时,平均ROC曲线面积随着压缩比例的增加而减小,而以obsworst和obsbest作为“金标准”时,平均ROC曲线面积分别保持不变或增加。
当观察者作为“金标准”时,邻面龋诊断的准确性显著高于通过显微镜获得真正金标准时。矛盾的是,当观察者作为“金标准”时,压缩、质量下降的图像比原始图像更准确,而以真正的金标准时则准确性更低。因此,使用正在评估的X线片上观察者的评分作为龋病存在的验证结果可能会误导临床医生。