Department of Biomedical Engineering, Washington University, St. Louis, MO 63130, United States of America.
Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO 63110, United States of America.
Phys Med Biol. 2023 Mar 21;68(7):074001. doi: 10.1088/1361-6560/acc0ce.
Synthetic images generated by simulation studies have a well-recognized role in developing and evaluating imaging systems and methods. However, for clinically relevant development and evaluation, the synthetic images must be clinically realistic and, ideally, have the same distribution as that of clinical images. Thus, mechanisms that can quantitatively evaluate this clinical realism and, ideally, the similarity in distributions of the real and synthetic images, are much needed.We investigated two observer-study-based approaches to quantitatively evaluate the clinical realism of synthetic images. In the first approach, we presented a theoretical formalism for the use of an ideal-observer study to quantitatively evaluate the similarity in distributions between the real and synthetic images. This theoretical formalism provides a direct relationship between the area under the receiver operating characteristic curve, AUC, for an ideal observer and the distributions of real and synthetic images. The second approach is based on the use of expert-human-observer studies to quantitatively evaluate the realism of synthetic images. In this approach, we developed a web-based software to conduct two-alternative forced-choice (2-AFC) experiments with expert human observers. The usability of this software was evaluated by conducting a system usability scale (SUS) survey with seven expert human readers and five observer-study designers. Further, we demonstrated the application of this software to evaluate a stochastic and physics-based image-synthesis technique for oncologic positron emission tomography (PET). In this evaluation, the 2-AFC study with our software was performed by six expert human readers, who were highly experienced in reading PET scans, with years of expertise ranging from 7 to 40 years (median: 12 years, average: 20.4 years).In the ideal-observer-study-based approach, we theoretically demonstrated that the AUC for an ideal observer can be expressed, to an excellent approximation, by the Bhattacharyya distance between the distributions of the real and synthetic images. This relationship shows that a decrease in the ideal-observer AUC indicates a decrease in the distance between the two image distributions. Moreover, a lower bound of ideal-observer AUC = 0.5 implies that the distributions of synthetic and real images exactly match. For the expert-human-observer-study-based approach, our software for performing the 2-AFC experiments is available athttps://apps.mir.wustl.edu/twoafc. Results from the SUS survey demonstrate that the web application is very user friendly and accessible. As a secondary finding, evaluation of a stochastic and physics-based PET image-synthesis technique using our software showed that expert human readers had limited ability to distinguish the real images from the synthetic images.This work addresses the important need for mechanisms to quantitatively evaluate the clinical realism of synthetic images. The mathematical treatment in this paper shows that quantifying the similarity in the distribution of real and synthetic images is theoretically possible by using an ideal-observer-study-based approach. Our developed software provides a platform for designing and performing 2-AFC experiments with human observers in a highly accessible, efficient, and secure manner. Additionally, our results on the evaluation of the stochastic and physics-based image-synthesis technique motivate the application of this technique to develop and evaluate a wide array of PET imaging methods.
模拟研究生成的合成图像在开发和评估成像系统和方法方面具有公认的作用。然而,对于临床相关的开发和评估,合成图像必须具有临床现实性,并且理想情况下,与临床图像具有相同的分布。因此,非常需要能够定量评估这种临床现实性和(理想情况下)真实和合成图像分布相似性的机制。
我们研究了两种基于观察者研究的方法,用于定量评估合成图像的临床真实性。在第一种方法中,我们提出了一种理论形式主义,用于使用理想观察者研究来定量评估真实和合成图像之间分布的相似性。这个理论形式主义为理想观察者的接收者操作特征曲线下面积(AUC)与真实和合成图像的分布之间提供了直接关系。第二种方法基于使用专家人类观察者研究来定量评估合成图像的真实性。在这种方法中,我们开发了一个基于网络的软件,用于进行专家人类观察者的二项式迫选(2-AFC)实验。通过对七位专家人类读者和五位观察者研究设计师进行系统可用性量表(SUS)调查,评估了该软件的可用性。此外,我们展示了该软件在评估基于随机和物理的肿瘤正电子发射断层扫描(PET)图像合成技术中的应用。在该评估中,我们的软件进行了 2-AFC 研究,由六位具有丰富 PET 扫描阅读经验的专家人类读者进行,他们的专业经验从 7 年到 40 年不等(中位数:12 年,平均:20.4 年)。
在理想观察者研究的基础上,我们从理论上证明了理想观察者的 AUC 可以通过真实和合成图像分布之间的 Bhattacharyya 距离来很好地近似表示。这种关系表明,理想观察者 AUC 的降低表明两个图像分布之间的距离减小。此外,理想观察者 AUC 的下限为 0.5 意味着合成和真实图像的分布完全匹配。对于基于专家人类观察者研究的方法,我们用于进行 2-AFC 实验的软件可在https://apps.mir.wustl.edu/twoafc 上获得。SUS 调查的结果表明,该网络应用程序非常用户友好且易于访问。作为次要发现,使用我们的软件评估基于随机和物理的 PET 图像合成技术表明,专家人类读者很难区分真实图像和合成图像。
这项工作满足了定量评估合成图像临床真实性的机制的重要需求。本文的数学处理表明,通过使用基于理想观察者研究的方法,理论上可以定量评估真实和合成图像分布的相似性。我们开发的软件为设计和以高效、安全的方式进行人类观察者的 2-AFC 实验提供了一个平台。此外,我们对基于随机和物理的图像合成技术的评估结果表明,该技术可以应用于开发和评估广泛的 PET 成像方法。