Herman Joshua D, Roca Rachel E, O'Neill Alexandra G, Wong Marcus L, Goud Lingala Sajan, Pineda Angel R
Manhattan College, Department of Mathematics, The Bronx, New York, United States.
University of Iowa, Roy J. Carver Department of Biomedical Engineering, Iowa City, Iowa, United States.
J Med Imaging (Bellingham). 2024 Jul;11(4):045503. doi: 10.1117/1.JMI.11.4.045503. Epub 2024 Aug 13.
Recent research explores using neural networks to reconstruct undersampled magnetic resonance imaging. Because of the complexity of the artifacts in the reconstructed images, there is a need to develop task-based approaches to image quality. We compared conventional global quantitative metrics to evaluate image quality in undersampled images generated by a neural network with human observer performance in a detection task. The purpose is to study which acceleration (2×, 3×, 4×, 5×) would be chosen with the conventional metrics and compare it to the acceleration chosen by human observer performance.
We used common global metrics for evaluating image quality: the normalized root mean squared error (NRMSE) and structural similarity (SSIM). These metrics are compared with a measure of image quality that incorporates a subtle signal for a specific task to allow for image quality assessment that locally evaluates the effect of undersampling on a signal. We used a U-Net to reconstruct under-sampled images with 2×, 3×, 4×, and 5× one-dimensional undersampling rates. Cross-validation was performed for a 500- and a 4000-image training set with both SSIM and MSE losses. A two-alternative forced choice (2-AFC) observer study was carried out for detecting a subtle signal (small blurred disk) from images with the 4000-image training set.
We found that for both loss functions, the human observer performance on the 2-AFC studies led to a choice of a 2× undersampling, but the SSIM and NRMSE led to a choice of a 3× undersampling.
For this detection task using a subtle small signal at the edge of detectability, SSIM and NRMSE led to an overestimate of the achievable undersampling using a U-Net before a steep loss of image quality between 2×, 3×, 4×, 5× undersampling rates when compared to the performance of human observers in the detection task.
近期研究探索使用神经网络重建欠采样磁共振成像。由于重建图像中伪影的复杂性,需要开发基于任务的图像质量评估方法。我们将传统的全局定量指标与人类观察者在检测任务中的表现进行比较,以评估神经网络生成的欠采样图像的质量。目的是研究使用传统指标会选择哪种加速倍数(2倍、3倍、4倍、5倍),并将其与人类观察者表现所选择的加速倍数进行比较。
我们使用常见的全局指标来评估图像质量:归一化均方根误差(NRMSE)和结构相似性(SSIM)。将这些指标与一种图像质量度量进行比较,该度量结合了特定任务的细微信号,以便进行局部评估欠采样对信号影响的图像质量评估。我们使用U-Net以2倍、3倍、4倍和5倍的一维欠采样率重建欠采样图像。对包含500张和4000张图像的训练集进行了交叉验证,同时使用了SSIM和均方误差(MSE)损失。针对从4000张图像训练集生成的图像进行了二选一强制选择(2-AFC)观察者研究,以检测细微信号(小模糊圆盘)。
我们发现,对于两种损失函数,在2-AFC研究中人类观察者的表现导致选择2倍欠采样,但SSIM和NRMSE导致选择3倍欠采样。
对于此在可检测性边缘使用细微小信号的检测任务,与检测任务中人类观察者的表现相比,在2倍、3倍、4倍、5倍欠采样率之间图像质量急剧下降之前,SSIM和NRMSE导致高估了使用U-Net可实现的欠采样。