Yu Zitong, Rahman Md Ashequr, Laforest Richard, Schindler Thomas H, Gropler Robert J, Wahl Richard L, Siegel Barry A, Jha Abhinav K
ArXiv. 2023 Apr 2:arXiv:2303.02110v5.
Artificial intelligence-based methods have generated substantial interest in nuclear medicine. An area of significant interest has been using deep-learning (DL)-based approaches for denoising images acquired with lower doses, shorter acquisition times, or both. Objective evaluation of these approaches is essential for clinical application. DL-based approaches for denoising nuclear-medicine images have typically been evaluated using fidelity-based figures of merit (FoMs) such as RMSE and SSIM. However, these images are acquired for clinical tasks and thus should be evaluated based on their performance in these tasks. Our objectives were to (1) investigate whether evaluation with these FoMs is consistent with objective clinical-task-based evaluation; (2) provide a theoretical analysis for determining the impact of denoising on signal-detection tasks; (3) demonstrate the utility of virtual clinical trials (VCTs) to evaluate DL-based methods. A VCT to evaluate a DL-based method for denoising myocardial perfusion SPECT (MPS) images was conducted. The impact of DL-based denoising was evaluated using fidelity-based FoMs and AUC, which quantified performance on detecting perfusion defects in MPS images as obtained using a model observer with anthropomorphic channels. Based on fidelity-based FoMs, denoising using the considered DL-based method led to significantly superior performance. However, based on ROC analysis, denoising did not improve, and in fact, often degraded detection-task performance. The results motivate the need for objective task-based evaluation of DL-based denoising approaches. Further, this study shows how VCTs provide a mechanism to conduct such evaluations using VCTs. Finally, our theoretical treatment reveals insights into the reasons for the limited performance of the denoising approach.
基于人工智能的方法在核医学领域引起了广泛关注。一个备受关注的领域是使用基于深度学习(DL)的方法对低剂量、短采集时间或两者兼有的情况下采集的图像进行去噪。对这些方法进行客观评估对于临床应用至关重要。基于DL的核医学图像去噪方法通常使用基于保真度的品质因数(FoM),如均方根误差(RMSE)和结构相似性指数(SSIM)进行评估。然而,这些图像是为临床任务而采集的,因此应根据它们在这些任务中的表现进行评估。我们的目标是:(1)研究使用这些FoM进行的评估是否与基于客观临床任务的评估一致;(2)提供理论分析以确定去噪对信号检测任务的影响;(3)证明虚拟临床试验(VCT)在评估基于DL的方法方面的实用性。我们进行了一项VCT来评估一种基于DL的心肌灌注单光子发射计算机断层扫描(MPS)图像去噪方法。使用基于保真度的FoM和曲线下面积(AUC)评估基于DL的去噪的影响,AUC量化了使用具有拟人化通道的模型观察者在检测MPS图像中的灌注缺损时的性能。基于基于保真度的FoM,使用所考虑的基于DL的方法进行去噪导致性能显著优越。然而,基于ROC分析,去噪并没有改善,实际上,往往会降低检测任务的性能。这些结果表明需要对基于DL的去噪方法进行基于客观任务的评估。此外,本研究展示了VCT如何提供一种使用VCT进行此类评估的机制。最后,我们的理论分析揭示了去噪方法性能有限的原因。