U.S. Food and Drug Administration, 10903 New Hampshire Ave., Building 62, Room 3102, Silver Spring, MD 20993-0002.
Department of Psychological and Brain Sciences, University of California, Santa Barbara, California.
Acad Radiol. 2017 Nov;24(11):1436-1446. doi: 10.1016/j.acra.2017.05.007. Epub 2017 Jun 27.
In this paper we examine which comparisons of reading performance between diagnostic imaging systems made in controlled retrospective laboratory studies may be representative of what we observe in later clinical studies. The change in a meaningful diagnostic figure of merit between two diagnostic modalities should be qualitatively or quantitatively comparable across all kinds of studies.
In this meta-study we examine the reproducibility of relative measures of sensitivity, false positive fraction (FPF), area under the receiver operating characteristic (ROC) curve, and expected utility across laboratory and observational clinical studies for several different breast imaging modalities, including screen film mammography, digital mammography, breast tomosynthesis, and ultrasound.
Across studies of all types, the changes in the FPFs yielded very small probabilities of having a common mean value. The probabilities of relative sensitivity being the same across ultrasound and tomosynthesis studies were low. No evidence was found for different mean values of relative area under the ROC curve or relative expected utility within any of the study sets.
The comparison demonstrates that the ratios of areas under the ROC curve and expected utilities are reproducible across laboratory and clinical studies, whereas sensitivity and FPF are not.
在本文中,我们研究了在对照性回顾性实验室研究中对诊断成像系统的阅读性能进行的比较,这些比较中有哪些可能代表了我们在后续临床研究中观察到的结果。两种诊断模式之间有意义的诊断优劣值的变化应该在各种研究中具有定性或定量的可比性。
在这项荟萃研究中,我们研究了几种不同的乳房成像模式(包括屏片乳腺摄影、数字乳腺摄影、乳腺断层合成和超声)的相对敏感度、假阳性率(FPF)、接收者操作特征曲线下面积(ROC)和预期效用的实验室和观察性临床研究中的重复性。
在所有类型的研究中,FPF 的变化导致共同平均值的概率非常小。在超声和断层合成研究中,相对敏感度相同的概率较低。在任何研究组中,均未发现相对 ROC 曲线下面积或相对预期效用的平均值不同。
该比较表明,ROC 曲线下面积和预期效用的比值在实验室和临床研究中具有可重复性,而敏感度和 FPF 则不然。